Jakub Arnold's Blog

requestAnimationFrame and useEffect vs useLayoutEffect

Sat, 09 May 2020 10:00:00 +0200

While trying to implement an animated number counter I stumbled upon an interesting issue with useEffect and requestAnimationFrame not playing together nicely that lead down a rabbit hole of confusion, but lucky for me I wasn’t the first one to stumble upon this, and react-use actually has resolved this exact issue in their useRaf hook. This post is a short explanation of the problem, and why useLayoutEffect fixes it.

`useEffect`

The useEffect hook is a useful new addition since React 16.8 that allows us to access lifecycle methods in functional components. It can act as componentDidMount, componentDidUpdate and componentWillUnmount while keeping the logic neatly in one place. Let’s quickly go over the few different use cases.

Running code after each render:

function Counter() {
+  const [counter, setCounter] = useState(0)
+
+  useEffect(() => {
+    document.title = `Counter: ${counter}`
+  })
+
+  return (
+    <button onClick={() => setCounter(counter + 1)}>
+      Clicked {counter} times
+    </button>
+  )
+}
+

While on some level this acts as componentDidMount and componentDidUpdate, what it really says is queue this function to run on each render. As a quick side note, you might be tempted to write the arrow function without braces like this (it might be more tempting with something else than an assignment, such as a function call):

useEffect(() => (document.title = `Counter: ${counter}`))
+

but if you try it React will crash with TypeError: destroy is not a function. This might be surprising at first, but the difference is the arrow function is now actually returning the value of the assignment, that is the string Counter: ${counter}. In order for useEffect to also handle cleanup (and be able to replace componentWillUnmount), it has a mechanism for the user to provide a cleanup function. This is actually what we did by accident, because the cleanup function is to be returned from the effect. But this shorter arrow function syntax is equivalent to writing the following:

useEffect(() => {
+  return (document.title = `Counter: ${counter}`)
+})
+

Now it should be more clear why react complains. It expects the return value to either be undefined, in which case it doesn’t do any cleanup, or a function which can be called. But we’re returning a string, which is not undefined, and React will crash when it tries to call it.

`useEffect` cleanup

Our previous example didn’t really have any need for cleanup. We’ll switch to using setInterval to create an auto-incrementing counter where we can also control the speed at which it increments.

Let’s start with a basic structure that is buggy and we’ll incrementally fix it. We’ll add a second state variable speed which will control the timeout parameter of a setInterval. We’ll also add two buttons to control the speed of the timer.

function Counter() {
+  const [counter, setCounter] = useState(0)
+  const [speed, setSpeed] = useState(1000)
+
+  useEffect(() => {
+    setInterval(() => {
+      // Similarly to `this.setState` in class components, `setCounter`
+      // can accept a function that takes in the current value and
+      // returns a new value
+      setCounter(x => x + 1)
+    }, speed)
+  })
+
+  return (
+    <p>
+      Counter: {counter} ... Speed: {speed}
+      <button onClick={() => setSpeed(speed + 100)}>+</button>
+      {/* The `Math.max` is here simply so we don't set the speed to `0` */}
+      <button onClick={() => setSpeed(Math.max(100, speed - 100))}>-</button>
+    </p>
+  )
+}
+

If you were to run this code you’ll see the issue very quickly (you can try it here, but be careful, it gets very laggy very quickly). The timer doesn’t increment by 1 every second. It increments by 1 the first second, then by 2, then by 3, then by 4, and so on. This is because by default useEffect will run on every single render, and we’re only ever setting new intervals, we’re not clearing the old ones.

A quick fix would be to return a cleanup function:

useEffect(() => {
+  const timerId = setInterval(() => {
+    setCounter(x => x + 1)
+  }, speed)
+
+  return () => clearInterval(timerId)
+})
+

But this is not ideal as well. Each time the setInterval ticks, it will call setCounter, which in turn causes the component to re-render. But our useEffect also runs on each render, which means the first tick of the timer will cause a re-render which in turn calls useEffect, which clears the first interval, and sets a new one. While the code seemingly does what it’s supposed to, it’s clearly not ideal to clear the interval on each render. We really only need to change it when speed changes. This is why useEffect has a second argument for a list of dependencies. Here’s the complete component:

function Counter() {
+  const [counter, setCounter] = useState(0)
+  const [speed, setSpeed] = useState(1000)
+
+  useEffect(() => {
+    console.log("Setting up a new interval")
+    const timerId = setInterval(() => {
+      setCounter(x => x + 1)
+    }, speed)
+
+    return () => clearInterval(timerId)
+  }, [speed])
+
+  return (
+    <p>
+      Counter: {counter} ... Speed: {speed}
+      <button onClick={() => setSpeed(speed + 100)}>+</button>
+      {/* The `Math.max` is here simply so we don't set the speed to `0` */}
+      <button onClick={() => setSpeed(Math.max(100, speed - 100))}>-</button>
+    </p>
+  )
+}
+

You can see it in action here.

I’ve also added a console.log to make it clear when the new setInterval is being set.

+
It is extremely important to make a small note about closures here. If we were to touch counter (for example setCounter(counter + 1)) inside the setInterval callback instead of passing in x => x + 1 then the closure would actually hold onto the variable counter at the time of its creation, and not get updated with the new value until speed changes (at which point the closure is re-created). We could potentially fix this by specifying [counter, speed] as deps, but that would be essentially re-creating the previous case where the interval only ever runs once.
+

By specifying [speed] as the dependency of our effect we can control when it gets re-run. This is similar to diffing props within componentDidUpdate, but React will do that automatically for us.

If we didn’t want to control the speed, we could simply pass in [] as an empty list of dependencies, which would make the useEffect equivalent to componentDidMount and not be affected by re-renders.

useEffect(() => {
+  const timerId = setInterval(() => {
+    setCounter(x => x + 1)
+  }, speed)
+
+  return () => clearInterval(timerId)
+}, [])
+

But since we care about controlling the speed, we have to leave out this option.

pausing and timing issues

We’ll modify our example a little bit to add a Pause/Resume controls instead of controlling the speed, as this is where I initially ran into the issue with requestAnimationFrame.

function Counter() {
+  const [counter, setCounter] = useState(0)
+  const [isPaused, setIsPaused] = useState(true)
+
+  useEffect(() => {
+    if (!isPaused) {
+      const timerId = setInterval(() => {
+        setCounter(x => x + 1)
+      }, 1)
+
+      return () => clearInterval(timerId)
+    }
+  }, [isPaused])
+
+  return (
+    <p>
+      Counter: {counter} ...{" "}
+      <button onClick={() => setIsPaused(!isPaused)}>
+        {isPaused ? "Resume" : "Pause"}
+      </button>
+    </p>
+  )
+}
+

You can run the code here.

We removed the speed state and instead added an isPaused state variable, which then controls if the timer is being increased.

But setInterval is not the right way to do animations, as it doesn’t synchronize with the browser’s re-painting mechanism. This is where requestAnimationFrame comes in, which basically tells the browser to run the given callback before the next repaint.

Before we switch to it, let us first rewrite the component so that it uses setTimeout instead of setInterval, as that will be nearly identical to the structure of the correct version with requestAnimationFrame.

Interestingly enough, this example already contains the same issue as the final version with requestAnimationFrame, but it is much harder to trigger.

function Counter() {
+  const [counter, setCounter] = useState(0)
+  const [isPaused, setIsPaused] = useState(true)
+
+  useEffect(() => {
+    if (!isPaused) {
+      let timerId
+
+      const f = () => {
+        setCounter(x => x + 1)
+        // Since `f` is only called in a `setTimeout` and not
+        // `setInterval`, it needs to re-schedule itself to run
+        // again after it finishes.
+        timerId = setTimeout(f, 1)
+      }
+
+      // The initial run is also scheduled via `setTimeout`
+      // to keep this in line with how `requestAnimationFrame`
+      // works, and to make the code overall more consistent
+      // in the way it executes.
+      timerId = setTimeout(f, 1)
+
+      return () => clearTimeout(timerId)
+    }
+  }, [isPaused])
+
+  return (
+    <p>
+      Counter: {counter} ...{" "}
+      <button onClick={() => setIsPaused(!isPaused)}>
+        {isPaused ? "Resume" : "Pause"}
+      </button>
+    </p>
+  )
+}
+

You can try the code here.

A few things changed in this version. We extract our update logic into a separate function f which is the invoked using setTimeout(f, 1) for the first time, and after that it schedules itself to run again as it finishes, using setTimeout(f, 1). This might seem strange, but it is exactly how requestAnimationFrame works.

Now for the final version with requestAnimationFrame, we simply use requestAnimationFrame(f) in place of setTimeout(f, 1), and cancelAnimationFrame in place of clearTimeout. This code tells the browser to run f before it will perform its next repaint, which in most cases will be 60 times per second, giving us a nice and smooth animation.

function Counter() {
+  const [counter, setCounter] = useState(0)
+  const [isPaused, setIsPaused] = useState(true)
+
+  useEffect(() => {
+    if (!isPaused) {
+      let timerId
+
+      const f = () => {
+        setCounter(x => x + 1)
+        timerId = requestAnimationFrame(f)
+      }
+
+      timerId = requestAnimationFrame(f)
+
+      return () => cancelAnimationFrame(timerId)
+    }
+  }, [isPaused])
+
+  return (
+    <p>
+      Counter: {counter} ...{" "}
+      <button onClick={() => setIsPaused(!isPaused)}>
+        {isPaused ? "Resume" : "Pause"}
+      </button>
+    </p>
+  )
+}
+

You can try the code here.

Now all it takes is to click the Resume / Pause button really fast, and in a couple of tries you should see the counter will ignore the button and keep increasing despite being paused.

This seems very strange, since we’re telling React to cleanup the request after the button is pressed. The thing is, despite what people say, useEffect is not actually the same as componentDidUpdate. The documentation actually mentions this at one point, but it didn’t occur to me at first what consequences it would have. The problem is the effect passed to useEffect is not run synchronously after the DOM is updated from the render call, but rahter at some point later. This means the browser isn’t blocked by the update logic and the app feels more responsive. Specifically in this case, the browser is able to re-paint before the effect (or its cleanup) is run.

In the case of document.title = ... we didn’t really care if the title was updated a few milliseconds later, but in the case of requestAnimationFrame it does make a difference. The problem is a new animation frame will be requested before the cleanup function of useEffect is called, since the cleanup is not run synchronously. This is essentially a timing issue, where sometimes the browser will re-paint right between our component rendering to the DOM, and the cleanup function being called. This means our f gets a chance to schedule itself again before it is cleaned up, and essentially escapes our cleanup logic.

Lucky for us, there is an easy fix. Apart from useEffect, there is also a useLayoutEffect hook which has exactly the same arguments and works the same way, except it runs synchronously after the DOM is updated. This is exactly what we need, as it will cancel the current animation frame request before a new one has a chance to be queued.

The fixed code is exactly the same, except for useEffect being replaced by useLayoutEffect

function Counter() {
+  const [counter, setCounter] = useState(0)
+  const [isPaused, setIsPaused] = useState(true)
+
+  useLayoutEffect(() => {
+    if (!isPaused) {
+      let timerId
+
+      const f = () => {
+        setCounter(x => x + 1)
+        timerId = requestAnimationFrame(f)
+      }
+
+      timerId = requestAnimationFrame(f)
+
+      return () => cancelAnimationFrame(timerId)
+    }
+  }, [isPaused])
+
+  return (
+    <p>
+      Counter: {counter} ...{" "}
+      <button onClick={() => setIsPaused(!isPaused)}>
+        {isPaused ? "Resume" : "Pause"}
+      </button>
+    </p>
+  )
+}
+

You can try the code here

Conclusion and references

This article is a good example on why reading documentation and paying attention to detail is important. When learning about hooks I remember hearing something along the lines of useEffect fires the effect asynchronously later, it didn’t immediately prompt me to ask the question if that could cause any issues, or what are some other use cases for useLayoutEffect. The example I’ve seen people mention with it over and over again is resizing windows or DOM mutations, where useEffect would cause a flicker in the UI and useLayoutEffect wouldn’t. Interestingly, this is the same problem as we’re facing with requestAnimationFrame, as in both cases we want to do something before the browser has a chance to repaint. Only in the case of requestAnimationFrame the repaint does more than a UI flicker, it breaks our code.

References

SSH Tunnel - Local, Remote and Dynamic Port Forwarding

Mon, 04 May 2020 10:00:00 +0200

SSH tunneling is an extremely useful feature of SSH that is very often googled, but less often understood enough to use without a reference. In this post I hope to explain it in such a way that you’ll have no confusion about when to use SHH’s local, remote, or even dynamic port forwarding. In its essence, port forwarding allows SSH to securely create an encrypted communication channel (a tunnel) between two computers on the network. We can use this channel to run commands on the remote server, expose a local port in a remote computer, expose a remote port on the local computer, or route traffic via a SOCKS proxy (more on this later).

Background

But first a tiny bit of background on how SSH works and why it’s secure. If you just want to get to the practical bits, feel free to skip this section and jump straight to Local Port Forwarding. You don’t need to understand it to use SSH tunnels in practice.

There are three types of encryption used at different stages: Diffie-Hellman, RSA, and AES (or other algorithms depending on configuration). If you’ve ever configured nginx and run into something called dhparam or ssl_dhparam, the dh in there stands for the Diffie-Hellman algorithm, which is an amazingly simple algorithm to exchange a secret key over an insecure communication channel, without any prior knowledge. If you understand exponentiation (e.g. $2^{10} = 1024$) and modulo (e.g. $16 \mod 5 = 1$) you can understand Diffie-Hellman, as it’s only a few steps that could be done even on paper.

The issue with Diffiel-Hellman is that while you can exchange a secret, you don’t really know who you’re exchanging it with, and is vulnerable to the main-in-the-middle attack (someone could pretend to be the server and exchange the key with you instead). That’s where one more step comes in and the server uses its private key to sign a hash of some of the Diffie-Hellman parameters (check out section 8 of the RFC on what exactly gets signed), and the client then verifies the hosts signature using its public key. This is where SSH asks you to verify the host fingerprint, which is the fingerprint of its public key, and if you say yes, it means you’re validating the server truly is who they say they are (and not an attacker), and they key exchange can continue. If you always say yes without verifying the host key, you’re vulnerable to a man-in-the-middle-attack.

After the server authenticity is confirmed, and the client and the server use Diffie-Hellman to negotiate a session key, which is then used to encrypt all of the traffic between them. You might be thinking why not use the already existing RSA keys (public/private keypair) of the client/server to encrypt the traffic? The answer is simple: asymmetric encryption is slow. Instead, SSH uses symmetric encryption (e.g. AES) to encrypt the traffic.

Lastly, the client is authorized against the server using it’s RSA keypair. The server will simply encrypt a random value using the client’s public key (taken from ~/.ssh/authorized_keys) and send it over, the client verifies itself by being able to decrypt the message (because it owns the private key) and sends it back. If the values match, the client’s identity is verified and is authenticated now. (Note that there are a few technical details, such as hashing the values together with the session key, but those are not important for understanding the overall flow.)

Local Port Forwarding

The first forwarding mode we’ll look at is local port forwarding with the -L flag. It’s called local because it allows us to forward connections from a local port to a different port on another computer on the network, using a secure SSH connection.

Say that you have a database (e.g. PostgreSQL) running on a server example.com on port 5432. The server is configured in such a way that only the SSH port 22 is open, and thus you can’t connect directly via psql -h example.com -p 5432. You could SSH to the server and run psql -h localhost -p 5432 on there, but what if you wanted to use a GUI client for the database, and connect to the server directly?

With SSH you can simply forward an arbitrary local port, say 4000, to the port 5432 on the server, but in such a way that the connection to 5432 would come as if from inside the server, and thus would be allowed. To do this we run:

ssh -N -L 4000:localhost:5432 user@example.com
+

We’ll use the -N flag with all commands, which tell SSH to not start a shell (or execute a given command) and only forward ports. Personally I find this useful to distinguish which SSH session I use for forwarding and which ones might be just regular shell connections. You can of course forward while starting a remote shell (just omit the -N flag).

The above command will connect to user@example.com and start forwarding the local port 4000 to localhost:5432 on the server. This means we can now run psql -h localhost -p 4000, and as psql establishes a connection to localhost on port 4000, SSH will securely forward the connection to example.com, where it connects to localhost:5432. This way psql doesn’t even know it’s connecting to a database running on a far away server.

One interesting tip is that we could forward to something else than localhost on example.com. Say that we have a second host named foobar.org, which is accessible only from the example.com server, but not accessible from your machine.

ssh -N -L 4000:foobar.org:5432 user@example.com
+

This way a connection to localhost:4000 on your machine would get forwarded through example.com to connect to foobar.org on port 5432. In theory, you could also use this to bypass a firewall blocking direct connections from your computer, but dynamic port forwarding solves this problem more naturally by creating a SOCKS proxy (as we’ll see shortly).

Remote Port Forwarding

While local forwarding allows us to forward local connections to a remote port, with remote port forwarding we can accept connections on a remote server, and forward those to a local port on our machine. Say that we have a folder we want to share with a friend via our example.com server, but we don’t want to copy the files over. We could start up a web server using python

$ python -m http.server
+

which creates a simple HTTP server on port 8000 that servers files from the current directory. We could then use SSH to remotly forward port 4000 on the server example.com to localhost:8000 as follows:

ssh -N -R 4000:localhost:8000 user@example.com
+

Now you can tell your friend to go to http://example.com:4000, and SSH will accept his connection, and forward it to your computer to localhost:8000. There is one small catch, and that forwarding ports like this requires you to edit the configuration of SSH on the server. Specifically, you need add (or edit) GatewayPorts yes to /etc/ssh/sshd_config and restart the SSH daemon (via sudo systemctl restart sshd), otherwise SSH won’t allow you to use this form of port forwarding.

Services like ngrok.com basically give you a fancy UI to remote port forwarding. They give you a CLI which you can run to make a local port available on the internet via a publically accesible subdomain. If you have your own server (which you can get for free on AWS/GCP, or for a few dollars per month on many providers), you can do exactly the same thing with a single command with SSH using remote port forwarding without any limitations, and save yourself some money :)

Dynamic Port Forwarding

The last type of forwarding is called dynamic port forwarding, which is perhaps a slightly confusing name, because the way you use it is different from the two previous forwarding mechanisms. With dynamic port forwarding you only specify the local port to bind to using the -D parameter, and SSH will then determine where to forward connections based on the SOCKS protocol. The way this works is that SSH creates a SOCKS server which acts as a proxy which you can use in other applications.

Let’s say you still have the example.com server with the SSH port open. It is also part of a private network with other servers on it, say a website private.example.com, which is not accesible directly from the internet, but is accesible from example.com. First we connect to example.com with dynamic port forwarding:

ssh -N -D 5000 user@example.com
+

To connect to private.example.com we need to configure the web browser to use our SOCKS proxy. In Firefox this can be done with Network Settings -> Manual proxy configuration -> SOCKS host and select SOCKS v5 and set SOCKS Host to localhost and Port to 5000. You can also check Proxy DNS when using SOCKS v5 to resolve DNS using your SOCKS proxy, instead of resolving the hostname on your machine prior to making the request.

After this, you can just press OK and type private.example.com in the address bar and hit enter, and SSH will do the rest. Specifically it will connect to localhost:5000 via the SOCKS protocol and forward your request via the server to the website private.example.com. In practice, this is as if you used a VPN to connect to the private network. The downside is you need to configure your browser (and any other program) which you want to connect via the proxy. It doesn’t connect your whole computer inside the network as a VPN program could, but this could also be considered a benefit if you just want to access something in isolation.

If you’re using Chrome (or Chromium), you can use this nifty one-liner to start a new instance with the SOCKS configuration pre-filled:

$ chromium --proxy-server=socks://localhost:5000 \
+    --user-data-dir=/tmp/foo
+

The --proxy-server=socks://localhost:5000 option does exactly what it says, it sets the SOCKS proxy configuration option. The --user-data-dir option is a nice addition, because this way you could have a completely separate user profile for the proxied browser.

As a final note, you can use dynamic port forwarding to do things like access a website avaialble only in a specific country if you have a server example.com which is hosted in that country. Or you could connect to websites on your company’s private network as long as you can SSH to any server on the network.

Conclusion

We covered three ways of port forwarding:

Local port forwarding used for tunneling local connections to a port on a remote server.
Remote port forwarding used for tunneling remote connections to a port on a local server.
Dynamic port forwarding used for creating a TCP proxy via a remote host.

The given examples only scratch the surface of possible use cases. There are many cases where local or remote port forwarding can be useful during debugging multi-server architectures. You could even create multi-hop SSH tunnels where you tunnel from A to B, and then from B to C, e.g.

ssh -L 9999:host2:1234 -N host1
+

You can even use the first ssh command to run ssh on the remote host and create a second tunnel as the first one is created

ssh -L 9999:localhost:9999 host1 ssh -L 9999:localhost:1234 -N host2
+

which is not only cool, but also creates a secure SSH tunnel from host1 to host2 as opposed to the first method which does not (see this answer on SuperUser for more interesting examples.

Most importantly, play around and experiment with SSH when you get a chance! While not every combination of tunnels might be the best solution to your problem, there were certainly many times where knowing how to solve a problem using SSH tunnels saved me hours of otherwise tedious work (usually involving moving stuff around between servers).

Git Command Overview with Useful Flags and Aliases

Sun, 03 May 2020 23:28:14 +0200

This post is a short guide to making your git usage a little more efficient. We’re not going to cover how git works in depth. Instead, we’ll look at the most common operations and useful flags, with the goal to create a set of bash or zsh aliases for daily use. A completel list of aliases presented in the article is summarized at the end of the article.

Each section will first briefly describe the command, some of its useful flags, and then suggest a set of mnemonic aliases with their usage. Contrary to what some people might thinks, we won’t use the builtin git alias functionality using git config (that is e.g. git config --global alias.co checkout), but rather plain shell aliases such as alias gco="git checkout". The reason is simple, it is much easier and faster to type gco than git co, which makes git usage more enjoyable. Since (almost) all of our aliases will be prefixed with g (such as ga, gco, gc, …) they will be just as easy to discover if you ever forget them as their git config alias counterpart.

`git status`

We’re going to skip ahead alphabetically and cover git status right now, as it will be useful in explaining the other commands. Everyone who ever tried git had to write git status at some point, yet of all the people I’ve met, only a few know of its extremely useful variant git status -sb. Let me illustrate on a straightforward example where we simply create three file f1, f2, f3, modify some of them, and look at how the output of git status differs from git status -sb (I tried to make the example self-contained so you can try it yourself):

$ mkdir status-demo
+$ cd status-demo
+$ git init
+Initialized empty Git repository in /home/darth/projects/status-demo/.git/
+$ # This lets us make an empty commit
+$ git commit -m "Initial commit" --allow-empty
+[master (root-commit) 35773eb] Initial commit
+$ touch f1 f2 f3
+$ git status
+On branch master
+Untracked files:
+  (use "git add <file>..." to include in what will be committed)
+        f1
+        f2
+        f3
+
+nothing added to commit but untracked files present (use "git add" to track)
+$ git status -sb
+## master
+?? f1
+?? f2
+?? f3
+$ git add .
+$ git commit -m "Add a few files"
+[master b97a26a] Add a few files
+ 3 files changed, 0 insertions(+), 0 deletions(-)
+ create mode 100644 f1
+ create mode 100644 f2
+ create mode 100644 f3
+$ echo x > f1
+$ rm f2
+$ echo x > f3
+$ echo x > f4 # new file
+$ git add f1
+$ git status
+On branch master
+Changes to be committed:
+  (use "git restore --staged <file>..." to unstage)
+        modified:   f1
+
+Changes not staged for commit:
+  (use "git add/rm <file>..." to update what will be committed)
+  (use "git restore <file>..." to discard changes in working directory)
+        deleted:    f2
+        modified:   f3
+
+Untracked files:
+  (use "git add <file>..." to include in what will be committed)
+        f4
+
+$ git status -sb
+## master
+M  f1
+ D f2
+ M f3
+?? f4
+

The code examples are intentionally a bit verbose to keep them reproducible. You can simply follow along command by command in your own terminal. Playing around with git is a great way to learn!

Since the code highlighter on this post does not capture the terminal output highlighting, here’s the last two commands in a screenshot, showing how git status and git status -sb share the same color highlighting, but only format the output differently.

git status -sb simply drops all of the text and additional information, and keeps the important parts of the output - what branch are we on and what has changed in what way. It will mark modifications with M, deletions with D, and newly added files with ??. The column in which the M/D is displayed also signifies if the change was staged. For example, the modifications to f1 were staged with git add, hence the M displays in the left column and in yellow (same as regular git status). Modifications to f3 and the deletion of f2 were not staged, which is why they’re in green in the second column.

While this might be a little confusing at first, I promise that it only takes a couple of minutes to get used to this, and that your git experience will be much improved from using git status -sb over the regular git status. Suddenly your terminal won’t fill half the screen with each status, and you won’t have to scroll around to find your previous commands after using git status for a few times. It might seem like a small thing, but at least personally for me, this single command (along with the suggested alias) transformed my git usage from annoying to joyful.

To make the matters even more controversial, I suggest a different alias for git status -sb, and that is:

alias s="git status -sb"

Now you might think this is insane, a single letter alias for an arbitrary git command? The reason is, at least in my personal experience, that this is the most common command I use out of all terminal commands. Here are the top 6 commands from my history:

1  648  6.48065%   s
+2  640  6.40064%   cd
+3  505  5.05051%   gc
+4  410  4.10041%   vim
+5  331  3.31033%   docker
+6  292  2.92029%   ga
+7  266  2.66027%   ls
+

You can create a similar statisic for yourself using the following command (reference):

history | awk '{CMD[$2]++;count++;}END { for (a in CMD)print CMD[a] " " CMD[a]/count*100 "% " a;}' | grep -v "./" | column -c3 -s " " -t | sort -nr | nl |  head -n10
+

At least in my case s is a winner and beats even cd, followed by gc (the git commit -v from before). This makes sense, because every time you would want to make a new git commit, you’d cd into the directory and check the status.

This statistic (at least for my git usage) confirms that git status -sb is not just a random command, it is the command, and as such it deserves a special alias. Some people suggest using alias gs="git status -sb", which might be leaning on the safe side, and I definitely started out that way. But is there really anything else that would deserve the glorious one letter s alias than git status -sb?

`git add`

Git separates the working tree (the files being edited) from the index, that is the staged changes which are ready to be comitted. Before using git commit, we have to stage our changes using git add. But since this doesn’t have to include all of the changes, git add comes with quite a few options (check man git-add).

For example, git add -u only stages files that were modified (or deleted), but not newly added files. To add all modifications, deletions and additions, one can simply run git add . But sometimes we want to be more granular than adding whole files. This is where git add -p (or --patch) comes in, which launches an interactive mode, prompting the user with each change whether they want to add it to the index.

Aliases:

alias ga="git add"
alias gau="git add -u"
alias gap="git add -p"

`git branch`

Branching is an integral art of git, and as such the git branch command deserves at least one alias of its own. Lucky for us we can rely on git checkout for most of the branching shenanigans, and as such we only mention git branch --all. Deleting branches is sometimes useful as well, but as there are multiple ways - with some people preferring -d and some -D - we leave this out of the aliases.

alias gb="git branch"
alias gba="git branch --all"

`git commit`

Creating new commits is one of the most common operations when using git, and as such we want to have a decent setup for it. Firstly, git commit has a -v flag which causes it to show a complete diff when it opens the editor for the commit message.

One benefit of using git commit -v as opposed to git commit -m "Some message" is that you get one last chance to inspect what is being changed. This might be as simple as holding down Ctrl-D to scroll down in Vim in a matter of seconds, but there are certainly times where such quick visual inspection can catch unexpected files being committed (especially when the size of the diff is much larger/smaller than expected, or weird characters pop up). As such, we’ll use -v as a default option for all our git commit commands.

We can also use -a to automatically stage all modified files before committing (similar to git add -u).

Lastly, there is the --amend options, which gives us a way to fix the last commit if it hasn’t been pushed. Since git history is immutable, this does not actually change the commit, but instead creates a new one and resets HEAD to it. This is critical information, because it means you should not --amend after you used git push. That is after some other computer contains the old commit. If you do git commit --amend after pushing, you will be required to force push (git push --force), as your tree is not a simple extension of the tree on the origin server, and needs to be overwritten by your local copy. This is similar to using git rebase or git reset --hard as we’ll see later. In simple terms, if you git push --force you’re in some sense overwriting origin, and if someone else ran git fetch (or git pull) in the meantime, they will have a different local tree that won’t be compatible with origin after you for push anymore. There are definitely ways to work around these problems, but none of them are trivial, and out of the scope of this article.

If you’re interested in a followup article covering git push --force or some other topic, feel free to leave a comment below the article or message me on twitter.

alias gc="git commit -v"
alias gca="git commit -v -a"
alias gcam="gca --amend" - Here you might want to opt out of the -a depending on your preference.

`git cherry-pick`

Cherry picking allows us to copy (apply) arbitrary commits to our HEAD. While not a common operation it does come in handy from time to time, especially when fixing previous git-related issues. As cherry-picking is very problem specific, we don’t introduce any default flags and only use the most basic alias

alias gch="git cherry-pick"

`git checkout`

Git checkout is our first command that directly manipulates the working tree. First we introduce git checkout <BRANCH> as a way to switch between branches. We can also use this to switch our working tree to an arbitrary commit in the history (as in git checkout <REF>). Surprisingly to some, git checkout is better at creating new branches than git branch is, as it allows us to create the branch and switch to it with a single command git branch -b <BRANCH>.

Lastly, we can use git checkout <FILE> to discard its changes and revert it back to the version in the index, or in case nothing is staged, to HEAD. A small example to illustrate this where we create and commit a new file, then make some changes to it, stage them, make some more changes, use git checkout to reset the unstaged changes, then use git reset to clear the stage, and again git checkout to reset the remaining changes.

One sidenote, consider if we had a file named master while also having a branch named master. If we wanted to use git checkout on the file and wrote git checkout master, git would actually refer to the branch instead. For this reason git provides a -- argument, which tells git to process the comes after it as filenames. This means git checkout -- master will apply git checkout on a file named master. It’s a good idea to use this option every time you want to apply git checkout to a file, as you might forget you have a branch with the same name and get confused about the results (especially if you create lots of temporary branches/files named a or foo or test). For this reason we’ll use git checkout -- file.txt as opposed to git checkout file.txt.

$ mkdir checkout-demo
+$ cd checkout-demo
+$ echo aaa > file.txt
+$ git init
+Initialized empty Git repository in ~/checkout-demo/.git/
+$ git add .
+$ git commit -m "Initial commit"
+[master (root-commit) 52ae6f1] Initial commit
+ 1 file changed, 1 insertion(+)
+ create mode 100644 file.txt
+$ echo bbb >> file.txt
+$ git status -sb               #
+## master                      #
+ M file.txt           # -------- here the ` M` is
+$ git add file.txt             # in the right column meaning
+$ echo ccc >> file.txt         # unstaged changes only
+$ git status -sb               #
+## master                      #
+MM file.txt           # -------- here we have `MM` for
+$ cat file.txt                 # both staged and unstaged
+aaa                            # changes
+bbb                            #
+ccc                            #
+$ git checkout -- file.txt     #
+$ git status -sb               #
+## master                      #
+M  file.txt            # ------- and finally here the `M `
+$ cat file.txt                 # is on the left, signifying
+aaa                            # staged changes only
+bbb                            #
+$ git reset                    #
+Unstaged changes after reset:  #
+M       file.txt       # ------- this is not what we *have after*,
+$ git status -sb               # but what *was changed*
+## master                      #
+ M file.txt             # ------ confirming the output,
+$ git checkout -- file.txt     # `git reset` unstaged our changes
+$ git status -sb               # and now we're back to ` M`
+## master
+$ cat file.txt
+aaa
+

The code examples are intentionally a bit verbose to keep them reproducible. You can simply follow along command by command in your own terminal. Playing around with git is a great way to learn!

If you found this example confusing due to the git status -sb outputs not being correctly highlighted, or are confused in general, here’s the same thing but in a properly colorized screenshot

As noted before, yellow means staged, green means unstaged, and git checkout modifies the unstaged changes, meaning it removes the green ones.

alias gco="git checkout"

`git diff`

Similarly to git status, checking the changes made to the working dir is a very common operation. The git diff has a useful flag which I would recommend using by default, and that is git diff -M. This will allow git diff to detect when a file was renamed. If you rename a file and don’t use -M, git diff will show one file as deleted and another one as newly added, as opposed to showing it as a rename.

Note that this option does not work 100% of the time, because git can’t know with certainty that a file was renamed. The filesystem doesn’t keep any kind of log of rename operations, nor is it written in any kind of metadata. Git will simply look at the contents of the two files, compare them, and if the amount of changes is small enough, it will consider the file as renamed. You can actually specify the threshold for changes with the -M flag, specifically -M90% would tell git to only consider something to be renamed if more than 90% of the file hasn’t been changed. By default this is set to 50%. This might seem silly at first, but consider renaming a file in your editor and then making changes to it. You wouldn’t necessarily commit things right after the file was renamed, yet you would still expect git to track the rename.

alias gd="git diff -M"

Personally I use git diff so often I devoted a second one-letter alias to it, specifically just d as opposed to gd. That is the following:

alias d="git diff -M"

As opposed to alias s="git status -sb" I’d say this one is more debatable, considering git diff is used much less often than git status.

There is one more useful flag with git diff, and that is --cached. The regular git diff only shows diff between the working directory and the index. That is changes you could possibly stage with git add. But sometimes you might have already staged some changes while leaving others unstaged (say with git add -p) and would like to see only the changes that would be comitted. This is where git diff --cached comes in, which will show the difference between your staged changes and HEAD.

alias gdc="git diff -M --cached"

An honorary mention goes to the --word-diff flag, which is not common enough to make an alias, but still useful to know about and I suggest playing around with it (and also look it up in man git-diff). By default git diff will show how the whole line has changed, but with --word-diff it will try to show diff within the line itself (on single words). This can be useful when making small tweaks to long lines, such as READMEs or documentation, but less so on code (which is why it’s not the default).

`git fetch`

Sometimes people are surprised when they see me using git fetch followed by a git merge --ff-only and they ask why even bother running git fetch when you could just git pull, right? The problem with git pull is that it essentially does two things at once. It fetches the changes from the remote, and then merges (or rebases, depending on the flags) your HEAD with the changes, with the issue being that you don’t see what has changed before it begins with the merge.

Automating the merge is fine if it is what you actually wanted to happen, and for this reason I don’t think git pull is inherently a bad command. But more often than not I find people get surprised by the result of the pull, as it does something to their local copy that they didn’t expect. For that reason alone it might be useful to consider using a combination git fetch, looking at the changes with git log (as we’ll see shortly), and then manually merging/rebasing as needed.

By default git fetch will only fetch the tracking remote, but there is also the --all option which will fetch from all remotes.

alias gf="git fetch"
alias gfa="git fetch --all"

`git log`

Now perhaps one of the commands which forces many people to use GUIs instead of the terminal interface to git. They simply can’t find useful information from the default git log output, and I don’t blame them. Personally I can’t use git on a computer without a proper git log alias as well, and often resort to either searching github for my own git log alias, or just use a GUI.

There are essentially two (or three) parts to making git log usable. One is forcing it to only print out one line per commit, which can either be done using --oneline, or a custom format (as shown shortly) with --pretty. The second is --graph, which visually represents branches and history. The third, and optional, is --all, which shows the history for all branches, not just the past of where HEAD is pointing to.

All of this combined is git log --graph --oneline --all, which when applied to Facebook’s React at the time of writing this article looks like this:

While this output is very useful and already infinitely better than the default, it does not display the author, when the commit was made, and doesn’t give us a way to customize its colors (which might be important if you have a custom color scheme).

Thus I present to you the full version of the --pretty version.

The above screenshot was created with the following command:

git log --all --graph --pretty="format:%C(yellow)%h%C(auto)%d%Creset %s %C(white) %C(cyan)%an, %C(magenta)%ar%Creset"
+

It looks complicated, but if you look at it for a few seconds you can see it really isn’t. We’re just passing in a format string with two different kinds of placeholders. One is the colors, e.g. %C(yellow) or %C(auto), and the others are the actual content, such as %h or %s. You can certainly customize the command to your liking, and I suggest you look at man git-log in the PRETTY FORMATS section which lists all of the possible format string options.

Because sometimes we want to view the history of the current branch only, and sometimes we want to see all of it at once, we’ll resort to two aliases, differing only with the use of --all.

alias gl='git log --graph --pretty="format:%C(yellow)%h%C(auto)%d%Creset %s %C(white) %C(cyan)%an, %C(magenta)%ar%Creset"'
alias gla='gl --all'

`git merge`

Since branches are an integral part of git, merging branches is equally if not more important. There are two important kinds of merges, fast-forward and with a merge commit. Fast-foward means the current branch set to the ref in which it is being merged into. For example, if we’re fast-forward merging master into origin/master and we have no pending changes in the working directory, this is essentially equivalent to running git reset --hard origin/master, as the master label simply changes to a commit further in the history. But sometimes the target ref is not directly ahead and might be lying on a parallel branch, in which case git will create a merge commit that joins the two branches together.

There are three important flags which should be paid attention to:

--ff (default): tries to perform a fast-forward, and if it can’t it will create a merge commit
--no-ff: creates a merge commit every single time, even if a fast-forward is possible
--ff-only: tries to perform a fast-forward and exits when if it is not possible

Most problems arising during git usage come from git doing something that the user did not expect, which is why the first option (--ff) can be quite dangerous. You simply don’t know if a fast-forward will happen or if a merge commit will be created unless you examine the history and are certain how fast-forward works.

You might think that git will be able to fast-forward, run git merge --ff, and be surprised by the result and maybe have to revert it, or only realize it later and have even more work to fix. This is why I suggest to never use the default --ff option, but instead be explicit and either use --ff-only or --no-ff.

The benefit is that --ff-only is somewhat safer and thus you can run it without that many worries to basically check if a fast-forward is possible. If it’s not, the command will simply fail, and you can decide if you want to run the --no-ff version instead, or maybe re-examine the history and figure out why it failed.

alias gm="git merge --no-ff"
alias gmf="git merge --ff-only"

`git push`

There’s not that much to be said about git push, other than the --tags flag which causes git to push all of the tags to the remote repo, meaning you can do git push --tags origin instead of git push origin <tag name>.

alias gp="git push"
alias gpt="git push --tags"

`git reset`

Manipulating the HEAD with git reset is one of the lesser understood commands, and can lead to some potentially dangerous situations. It has multiple modes, the three most useful ones are the following (all examples assume your working dir is clean and you have no pending changes):

git reset --soft: Moves the HEAD, but leaves the index and working dir as is, meaning if you run git reset --soft HEAD~ you’ll be in the state right before the last commit you made. Meaning all your changes are already staged.
git reset --mixed (default): Resets the index but leaves the working dir as is. As this is the default, the most common use case is to just run git reset without any arguments (which is exactly the same as git reset --mixed), which will unstage all of your changes, but leave the working dir intact.
git reset --hard: My favorite variant, and also the most dangerous one. This resets everything to the specified commit, including the working dir. If you run git reset --hard with no target, it will reset all of your staged and unstaged changes to HEAD, essentially saying please discard all the changes I have made. If you specify a target, say git reset --hard HEAD~ it means please discard all changes and set HEAD and my current branch to one commit ago, which can be useful if you made a commit and want to discard it completely (just be mindful and don’t this if you’ve already pushed your changes, or if you’re not sure what you’re doing in general).

One last worthy mention is git reset --patch, which similarly to git add --patch (or -p) will ask you which changes you want to keep staged and which unstaged, and then reset the HEAD.

The respective suggested aliases are:

alias gr="git reset"
alias grp="git reset --patch"
alias grh="git reset --hard"
alias grsh="git reset --soft HEAD~"

`git rebase`

Similarly to git reset, the git rebase command is one of the lesser understood but more dangerous commands. Some people use it to edit the history, but that’s not really what it does. The git history tree is immutable, and we can only ever add new commits to it, which is exactly what git rebase does. It will copy and re-apply existing commits in a different place, possilby modifying them along the way.

Let’s go through the example presented in the git rebase manpage (available at man git-rebase):

+
Assume the following history exists and the current branch is “topic”:
+
          A---B---C topic
+        /
+    D---E---F---G master
+
+
From this point, the result of either of the following commands:
+
git rebase master
+git rebase master topic
+
+
would be:
+
                  A'--B'--C' topic
+                /
+    D---E---F---G master
+
+

At first glance it might seem that git rebase somehow took the commits A, B, and C and moved them over to start at G, but if you look more closely you can see that they were renamed A', B' and C'. This is very much intentional, because git rebase is not moving the commits, it is simply creating new ones in a different place. The original commits are still in the tree, they are just not visible because nothing is pointing to them. If we were to write down the ref of C prior to doing the rebase, then ran the rebase, and ran git tag REF, it would re-appear in the history as if by magic. That’s because it was there all along, git rebase only created its copy on top of G.

The same would happen if we ran git rebase -i HEAD~5, that is interactively rebase last five commits. Sometimes people would do this to squash commits before submitting a pull request. It is important to note that this does not replace those five commits with a new one, it only creates one new commit with the contents of the five, and resets the branch label (and HEAD) to it, making it appear as if the original commits were edited.

While rebase is replaying commits it might run into a conflict, in which case it will stop and ask the user to resolve the conflict. After the conflict is resolved, the rebase can continue with git rebase --continue. As this situation is quite common, we devote an alias to it as well.

alias grb="git rebase"
alias grbc="git rebase --continue"
alias grbi="git rebase -i"

A word of caution: If you decide to use git rebase on a production codebase, I suggest you take a lookt at git reflog first (man git-reflog) and play around with how different rebase variants get stored in the reflog so you can recover when things go wrong.

`git remote`

Listing and managing remotes is a common practice in any git-controlled project, even if it just means pushing to GitHub (or other). Sometimes we might not be certain where origin is, and that is where git remote -v comes in handy, as it will simply print out the list of remotes.

alias grv="git remote -v"

`git stash`

There’s not that many things worthy to mention with git stash, especially since you could accomplish the same using temporary branches and cherry-picking. But sometimes the changes are small enough to warrant the use of git stash instead, and that’s why we introduce a few handy aliases:

alias gst="git stash"
alias gstp="git stash pop"

`git show`

Last command on the list is git show, which simply tells git show me what changed in this commit (including the author, date, and commit message in detail). This might be useful both when checking what the last commit was (either git show HEAD or just git show without an argument), or when looking at a particular commit in the past.

The alias gw might not be what you immediately think of, but since other gs-prefixed ones are taken, it is at least somewhat phoentically resemblant of the command itself.

alias gw="git show"

Conclusion

If you’ve read this far I hope you learned at least a thing or two. Git is a massive tool with hundreds of useful flags and sub-commands and there are definitely times where having that one extra trick up your sleeve can save you hours of pain. If you have any tips, suggestions, corrections, or feedback, please do leave a comment below or hit me up on twitter. I’ll be sure to reply to each and every comment.

Here is a complete list of aliases mentioned in the article, ready to be copy-pasted into your ~/.bashrc or ~/.zshrc or wherever else you store your aliases (sorry fish users).

alias s="git status -sb"
+alias ga="git add"
+alias gau="git add -u"
+alias gap="git add -p"
+alias gb="git branch"
+alias gba="git branch --all"
+alias gc="git commit -v"
+alias gca="git commit -v -a"
+alias gcam="gca --amend"
+alias gch="git cherry-pick"
+alias gco="git checkout"
+alias d="git diff -M"
+alias gdc="git diff -M --cached"
+alias gf="git fetch"
+alias gfa="git fetch --all"
+alias gl='git log --graph --pretty="format:%C(yellow)%h%C(auto)%d%Creset %s %C(white) %C(cyan)%an, %C(magenta)%ar%Creset"'
+alias gla='gl --all'
+alias gm="git merge --no-ff"
+alias gmf="git merge --ff-only"
+alias gp="git push"
+alias gpt="git push --tags"
+alias gr="git reset"
+alias grp="git reset --patch"
+alias grh="git reset --hard"
+alias grsh="git reset --soft HEAD~"
+alias grb="git rebase"
+alias grbc="git rebase --continue"
+alias grbi="git rebase -i"
+alias grv="git remote -v"
+alias gst="git stash"
+alias gstp="git stash pop"
+alias gw="git show"
+

Eigenvalues and Eigenvectors: Basic Properties

Sat, 08 Dec 2018 01:44:08 +0100

Eigenvalues and eigenvectors of a matrix $\boldsymbol A$ tell us a lot about +the matrix. On the other hand, if we know our matrix $\boldsymbol A$ is somehow +special (say symmetric) it will tell us some information about how its +eigenvalues and eigenvectors look like.

Let us begin with a definition. Given a matrix $\boldsymbol A$, the vector $x$ is an eigenvector +of $\boldsymbol A$ and has a corresponding eigenvalue $\lambda$, if

$$ +\boldsymbol A \boldsymbol x = \lambda \boldsymbol x. +$$

The eigenvectors of a matrix $\boldsymbol A$ are exactly those vectors which +when transformed by the mapping defined by $\boldsymbol A$ are only scaled by +$\lambda$, but their direction does not change.

Eigenvalues and eigenvectors of a projection matrix

To understand what eigenvectors are and how they behave, let us consider a +projection matrix $\boldsymbol P$. What are $x$’s and $\lambda$’s for a +projection matrix?

The key property we’ll use is $\boldsymbol P^2 = \boldsymbol P$. This is +because when we project a vector $x$ onto a plane to get $\hat x$, that is +$\boldsymbol P x = \hat x$, we would expect that projecting $\hat x$ again to +do nothing, since it already lies in the plane, that is

$$ +\hat x = \boldsymbol P \hat x = \boldsymbol P (\boldsymbol P x) = (\boldsymbol +P \boldsymbol P) x = \boldsymbol P^2 x. +$$

Now thinking about eigenvectors as those vectors which don’t change direction +when a projection matrix is applied, we can deduce two cases:

Any $x$ already in the plane: $\boldsymbol P x = x, \lambda = 1$.
Any $x$ perpendicular to the plane: $\boldsymbol P x = 0 x, \lambda = 0$.

As a result, a projection matrix $\boldsymbol P$ has two eigenvalues, $\lambda += 0$ and $\lambda = 1$, and two sets of eigenvectors. Those that lie in the +projection plane, and those that are perpendicular to it.

Eigenvalues of a $2 \times 2$ permutation matrix

One more small example, consider a $2 \times 2$ permutation matrix +$\boldsymbol A = \begin{pmatrix}0 & 1 \\ 1 & 0 \end{pmatrix}$.

We can find the eigenvectors straight away, at least the first one, which is +simply $x = (1\ 1)^T$, since $\boldsymbol A x = x$, and so its corresponding +eigenvalue is $\lambda = 1$.

If we think a little harder, we can guess the second eigenvector to be +$x = (-1\ 1)^T$, since $\boldsymbol A = -x$ with an eigenvalue $\lambda = -1$.

Computing eigenvalues and eigenvectors

We can re-arrange the terms in our definition to get a direct way to compute +eigenvalues and eigenvectors of a matrix $\boldsymbol A$. Simply move $\lambda x$ +to the left

$$ +\begin{aligned} +\boldsymbol A x &= \lambda x \\\\ +(\boldsymbol A - \lambda \boldsymbol I) x &= 0 \\\\ +\end{aligned} +$$

and then notice that $\boldsymbol A - \lambda \boldsymbol I$ must be singular, +because $x$ lies in its nullspace. We know that singular matrices have a zero +determinant, and we can use this to compute the eigenvalues $\lambda$ simply by +writing

$$ +\det (\boldsymbol A - \lambda \boldsymbol I) = 0. +$$

This is called the characteristic equation. The equation $det(\boldsymbol A

\lambda \boldsymbol I) = 0$ gives us a polynomial of degree $n$, which we can +use to find $n$ solutions $\lambda$. These need not be different, and can even +be complex numbers. But once we obtain the $\lambda$’s we can plug them back into

$$ +(\boldsymbol A - \lambda \boldsymbol I) x = 0 +$$

and one by one obtain their corresponding eigenvectors $x$.

Eigenvalues and eigenvectors of an upper triangular matrix

For a triangular matrix, the determinant is just the diagonal

$$ +det(\boldsymbol A) = \prod_{i=1}^n \boldsymbol A_{ii} +$$

which means solving the characteristic equation of $\boldsymbol A$ +simply amounts to multiplying out the diagonal

$$ +det(\boldsymbol A - \lambda \boldsymbol I) = \prod_{i=1}^n (\boldsymbol A - \lambda \boldsymbol I)_{ii}, +$$

which gives us a factored polynomial $(\boldsymbol A_{11} - \lambda)(\boldsymbol A_{22} - +\lambda)\ldots(\boldsymbol A_{nn} - \lambda)$, from which we immediately see that the +eigenvalues are the diagonal elements.

Diagonalization $\boldsymbol S^{-1} \boldsymbol A \boldsymbol S = \boldsymbol \Lambda$

Suppose we have $n$ linearly independent eigenvectors of $\boldsymbol A$. Put +them int the columns of $\boldsymbol S$. We now write

$$ +\def\vertbar{{\rule[-1ex]{0.5pt}{2.5ex}}} +$$

$$ +\boldsymbol A \boldsymbol S = A \begin{bmatrix} +\vertbar & \vertbar & & \vertbar \\\\ +x_1 & x_2 & \cdots & x_n \\\\ +\vertbar & \vertbar & & \vertbar +\end{bmatrix} = \begin{bmatrix} +\vertbar & \vertbar & & \vertbar \\\\ +\lambda_1 x_1 & \lambda_2 x_2 & \cdots & \lambda_n x_n \\\\ +\vertbar & \vertbar & & \vertbar +\end{bmatrix} = \boldsymbol S \boldsymbol \Lambda +$$

where $\boldsymbol \Lambda$ is a diagonal matrix of eigenvalues. Thus we get +$\boldsymbol A \boldsymbol S = \boldsymbol S \boldsymbol \Lambda$. If we have +$n$ independent eigenvectors in $\boldsymbol A$, we also get

$$ +\begin{align} +\boldsymbol A \boldsymbol S &= \boldsymbol S \boldsymbol \Lambda \\\\ +\boldsymbol S^{-1} \boldsymbol A \boldsymbol S = \boldsymbol \Lambda \\\\ +\boldsymbol A &= \boldsymbol S \boldsymbol \Lambda \boldsymbol S^{-1} +\end{align} +$$

The matrix $\boldsymbol A$ is sure to have $n$ independent eigenvectors (and be +diagonalizable) if all the $\lambda$’s are different (no repeated $\lambda$’s). +Repeated eigenvalues mean $\boldsymbol A$ may or may not have $n$ independent +eigenvectors.

Proof (ref G. Strang, Introduction to LA): Suppose $c_1 + x_1 + c_2 x_2 = 0$. Multiply by $\boldsymbol A$ to +find $c_1 \lambda_1 x_1 + c_2 \lambda_2 x_2 = 0$. Multiply by $\lambda_2$ +to find $c_1 \lambda_2 x_1 + c_2 \lambda_2 x_2 = 0$. Now subtracting +these two equations gives us

$$ +(\lambda_1 - \lambda_2) c_1 x_1 = 0. +$$

Since $\lambda_1 \neq \lambda_2$ and $x_1 \neq 0$, we conclude $c_1 = 0$. +We can derive $c_2 = 0$ the same way. Since $c_1 = c_2 = 0$ are the only +coefficients for which $c_1 x_1 + c_2 x_2 = 0$, we see that $x_1$ and +$x_2$ are linearly independent.

The same argument can be extended to $n$ eigenvectors and eigenvalues.

Sum of eigenvalues equlas the trace

Another very useful fact is that the sum of the eigenvalues equals +the sum of the main diagonal (called the trace of $\boldsymbol A$), +that is

$$ +\lambda_1 + \lambda_2 + \ldots + \lambda_n = \boldsymbol A_{11} + +\boldsymbol A_{22} + \ldots + \boldsymbol A_{nn} = Tr(\boldsymbol A). +$$

To prove this we’ll first show that $Tr(\boldsymbol A \boldsymbol B) = +Tr(\boldsymbol B \boldsymbol A)$.

To get a single element on the diagonal of $\boldsymbol A \boldsymbol B$ we +simply write

$$ +(\boldsymbol A \boldsymbol B)_{jj} = \sum_{k} \boldsymbol A_{jk} \boldsymbol +B_{kj} +$$

and to get the trace we just sum over all possible $j$ as

$$ +Tr(\boldsymbol A \boldsymbol B) = \sum_{j} \sum_{k} \boldsymbol A_{jk} +\boldsymbol B_{kj}. +$$

On the other hand, the $k$-th element on the diagonal of $\boldsymbol B +\boldsymbol A$ is

$$ +(\boldsymbol B \boldsymbol A)_{kk} = \sum_{j} \boldsymbol B_{kj} \boldsymbol +A_{jk} +$$

and the trace $Tr(\boldsymbol B \boldsymbol A)$ is

$$ +Tr(\boldsymbol B \boldsymbol A) = \sum_{k} \sum_{j} \boldsymbol B_{kj} +\boldsymbol A_{jk}. +$$

But since we can swap the order of summation and also swap the order of +multiplication, we get

$$ +Tr(\boldsymbol B \boldsymbol A) = \sum_{k} \sum_{j} \boldsymbol B_{kj} +\boldsymbol A_{jk} = \sum_{j} \sum_{k} \boldsymbol A_{jk} \boldsymbol +B_{kj} = Tr(\boldsymbol A \boldsymbol B). +$$

Now consider we have $n$ different eigenvalues. We can diagonalize the matrix

$$ +\boldsymbol S^{-1} \boldsymbol A \boldsymbol S = \boldsymbol \Lambda +$$

where $\boldsymbol \Lambda$ is a diagonal matrix of eigenvalues of $\boldsymbol +A$. Using our trace trick we can write

$$ +Tr(\boldsymbol \Lambda) = Tr(\boldsymbol S^{-1} \boldsymbol A \boldsymbol S) = +Tr((\boldsymbol S^{-1} \boldsymbol A) \boldsymbol S) = Tr(\boldsymbol S (\boldsymbol S^{-1} \boldsymbol A)) = +Tr((\boldsymbol S \boldsymbol S^{-1}) \boldsymbol A) = Tr(\boldsymbol I \boldsymbol A) = Tr(\boldsymbol A) +$$

and thus the sum of eigenvalues is equal the trace of $\boldsymbol A$. We’ve +only shown this for the case of $n$ different eigenvalues. This property does +hold in general, but requires some properties we haven’t proven yet (Jordan +normal form), and thus we skip the rest of the proof.

If you’re interested, check out the following +article which shows the +whole proof, and possibly the Wikipedia article on Jordan normal +form.

Powers of a matrix

If $\boldsymbol A x = \lambda x$, then we multiply by $\boldsymbol A$ and get

$$ +\boldsymbol A^2 x = \lambda \boldsymbol A x = \lambda^2 x. +$$

Continuing

$$ +\boldsymbol A^2 = \boldsymbol S \boldsymbol \Lambda \boldsymbol S^{-1} +\boldsymbol S \boldsymbol \Lambda \boldsymbol S^{-1} = \boldsymbol S \boldsymbol \Lambda^2 \boldsymbol S^{-1} +$$

or in general

$$ +\boldsymbol A^k = \boldsymbol S \boldsymbol \Lambda^k \boldsymbol S^{-1}. +$$

Theorem: $\boldsymbol A^k \rightarrow 0$ as $k \rightarrow \infty$ if all +$|\lambda_i| < 1$.

More properties

$\boldsymbol A$ and $\boldsymbol B$ share the same $n$ independent +eigenvectors if and only if $\boldsymbol A \boldsymbol B = \boldsymbol B +\boldsymbol A$.

This is true because $\boldsymbol A \boldsymbol B x = \lambda \beta x$ and +$\boldsymbol B \boldsymbol A x = \lambda \beta x$ since

$$ +\boldsymbol A \boldsymbol B x = \boldsymbol A \beta x = \beta \boldsymbol A x = \beta \lambda x. +$$

But this only holds if $\boldsymbol A$ and $\boldsymbol B$ share the same eigenvectors!

One last interesting fact we can show is what happens to the eigenvalues when +we add a constant $c$ to the matrix $\boldsymbol A$. The proof is rather trivial, +if $\boldsymbol A x = \lambda x$, then

$$ +(\boldsymbol A + c) x = (\boldsymbol A + c \boldsymbol I) x = (\lambda + c) x. +$$

Adding a constant to a matrix causes its eigenvalues to increase by exactly +that constant.

References and visualizations

Mixture of Categoricals and Latent Dirichlet Allocation (LDA)

Wed, 05 Dec 2018 21:50:26 +0100

Now that we’ve worked through the Dirichlet-Categorical model in quite a bit of detail +we can move onto document modeling.

Let us begin with a very simple document model in which we consider only a single distribution +over words across all documents. We have the following variables:

$N_d$: number of words in $d$-th document.
$D$: number of documents.
$M$: number of words in the dictionary.
$\boldsymbol\beta = (\beta_1,\ldots,\beta_M)$: probabilities of each word.
$w_{nd} \sim Cat(\boldsymbol\beta)$: $n$-th word in $d$-th document.
$I(w_{nd} = m)$: indicator variable saying that the $n$-th word in the $d$-th document is $m$.

To summarize, there is only one random variable $w_{nd}$, which is observed +and has a Categorical distribution with a parameter $\boldsymbol\beta$. We can +fit this model using maximum likelihood estimation (MLE) directly as we’ve +shown before, +that is

$$ +\beta_m = \frac{c_m}{N} +$$

where $c_m$ is the number of occurences of the $m$-th word and $N$ is the +total number of word occurences in all documents. It is important to distinguish +between $M$ and $N$, as $M$ is the number of words in the dictionary, that is unique words, +and $N$ is the sum of all the counts. Specifically:

$$ +\begin{align} +N &= \sum_{d=1}^D N_d \\\\ +c_m &= \sum_{d=1}^D \sum_{n=1}^{N_d} I(w_{nd} = m) +\end{align} +$$

In this case $c_m$ sums over document, and in each document it sums up all the +occurences of $m$-th word, that is summing over all indexes in that document +and adding $1$ for each occurence of $m$.

There are downsides to this simple model though. Sharing one $\boldsymbol\beta$ +between all documents means that all documents have the same distribution of words. +What we would like instead is allow each document to be about a different topic, +and have a different distribution of words for each topic.

Mixture of Categoricals

We’ll modify our model to allow it to have multiple topics, each of which will +have its own categorical distribution on words. We introduce a second random +variable $z_d \sim Cat(\boldsymbol\theta)$ where $\boldsymbol\theta$ is a +vector of topic probabilities.

$z_d$: assigns document $d$ to one of the $K$ categories.
$\theta_k = p(z_d = k)$: probability that a document $d$ is assigned to category $k$.
$w_{nd} | z_d \sim Cat(\boldsymbol\beta_{z_d})$: distribution over words +at $n$-th position in document $d$ given we have observed its topic $z_d$.

A small note, the distribution of $w_{nd}$ actually depends on the document +$d$ (on its topic), while in the previous document it remained constant +throughout. We can write this model in its generative process specification +as follows:

$$ +\begin{align} +z_d &\sim Cat(\boldsymbol\theta) \\\\ +w_{nd} | z_d &\sim Cat(\boldsymbol\beta_{z_d}) +\end{align} +$$

In order to allow our model to capture different topics, we introduced a set of +latent (hidden) variables $z_d$. Now the problem becomes, how do we perform +maximum likelihood estimation with these hidden variables? Let us write out the likelihood

$$ +\begin{align} +p(\boldsymbol w | \boldsymbol\theta, \boldsymbol\beta) +&= \prod_{d=1}^D p(\boldsymbol w_d | \boldsymbol\theta, \boldsymbol\beta) \qquad\text{expanding the marginal}\\\\ +&= \prod_{d=1}^D \sum_{k=1}^K p(\boldsymbol w_d, z_d = k | \boldsymbol\theta, \boldsymbol\beta) \\\\ +&= \prod_{d=1}^D \sum_{k=1}^K p(\boldsymbol w_d | z_d = k, \boldsymbol\beta) p(z_d = k | \boldsymbol\theta) \\\\ +&= \prod_{d=1}^D \sum_{k=1}^K \left( \prod_{n=1}^{N_d} p(w_{nd} | z_d = k, \boldsymbol\beta) \right) p(z_d = k | \boldsymbol\theta) \\\\ +\end{align} +$$

We cannot easily optimize through the marginalization, and even if we were to +take the log likelihood

$$ +\log p(\boldsymbol w | \boldsymbol\theta, \boldsymbol\beta) += \sum_{d=1}^D \log \left( \sum_{k=1}^K \left( \prod_{n=1}^{N_d} p(w_{nd} +| z_d = k, \boldsymbol\beta) \right) p(z_d = k | \boldsymbol\theta) \right) +$$

we run into a problem with a sum inside a log, that is $\sum \log \left(\sum +\ldots \right)$, and we cannot move the log inside the sum, and solving the +equation analytically is not possible, at least in the general form. This is +where the Expectation-Maximization (EM) algorithm comes in and allows us to +find a local optimum using MLE. But for now we’ll move onto the bayesian +approach and cover EM in a separate article in more depth.

Bayesian Mixture of Categoricals

To move from a point estimate of the EM to a fully bayesian treatment, we’ll +introduce a prior distribution over the parameters $\boldsymbol\theta$ and +$\boldsymbol\beta$. Since they are both parameters of a Categorical +distribution, it is of no surprise that our priors will be a Dirichlet +distribution.

The generative model specification then becomes

$$ +\begin{align} +\boldsymbol\theta \sim Dir(\boldsymbol\alpha) \\\\ +\boldsymbol\beta_k \sim Dir(\boldsymbol\gamma) \\\\ +z_d | \boldsymbol\theta \sim Cat(\boldsymbol\theta) \\\\ +w_{nd} | z_d,\boldsymbol\beta_{z_d} \sim Cat(\boldsymbol\beta_{z_d}) +\end{align} +$$

where $\boldsymbol\alpha$ is the hyperparameter over topic probabilities, while +$\boldsymbol\gamma$ is the hyperparameter over dictionary probabilities.

Now that we have our priors, we could compute a MAP estimate using EM, but we’ll +instead go one step further, extend our model to Latent Dirichlet allocation (LDA), +and then cover full bayesian inference using Gibbs sampling.

Latent Dirichlet Allocation (LDA)

One limitation of the mixture of categoricals model is that words in each +document are drawn only from one specific topic. The problem is when we have +documents that span more than one topic, in which case we need to learn a +mixture of those topics. We also allow the distribution of topics to vary +across documents.

In LDA, each document becomes a mixture of topics, but each word is still drawn +from one of those topics. We don’t need to introduce new random variables, +we’ll simply create more of what we have. In the mixture of categoricals we had +$z_d$ be a distribution over possible topics of the $d$-th document. We +replace it with $z_{nd}$ which becomes a distribution over possible topics of +the $n$-th word in $d$-th document. We also introduce $\boldsymbol\theta_d$, +which is a distribution over topics for the $d$-th document.

A generative model then becomes:

$$ +\begin{align} +\boldsymbol\theta_d &\sim Dir(\boldsymbol\alpha) \\\\ +\boldsymbol\beta_k &\sim Dir(\boldsymbol\gamma) \\\\ +z_{nd} | \boldsymbol\theta_d &\sim Cat(\boldsymbol\theta_d) \\\\ +w_{nd} | z_{nd}, \boldsymbol\beta_{z_d} &\sim Cat(\boldsymbol\beta_{z_d}) \\\\ +\end{align} +$$

We can view this generative process as a sequential recipe for generating a set +of documents:

For each document $d$, draw $\boldsymbol\theta_d$ from the prior +distribution $Dir(\boldsymbol\alpha)$.This is the parameter for our +Categorical distribution over topics.
For each topic $k$, draw $\boldsymbol\beta_k$ from the prior +$Dir(\boldsymbol\gamma)$. This is the parameter for our Categorical +distribution over words in that topic.
For each $z_{nd}$, that is each word position $n$ in a document $d$ draw its +topic from $Cat(\boldsymbol\theta_d)$. This allows us to have each word in +a document drawn from a different topic.
Draw each word $w_{nd}$ from its corresponding $Cat(\boldsymbol\beta_{z_{nd}})$.

The critical part here is in the last step, where each word at position $n$ in +document $d$ is drawn from $Cat(\boldsymbol\beta_{z_{nd}})$. The parameter +$\boldsymbol\beta_{z_{nd}}$ of the distribution of $w_{nd}$ depends on +$z_{nd}$, that is $p(w_{nd} | z_{nd})$. Even though each document $d$ has +only a single distribution over topics $Cat(\boldsymbol\theta_d)$, we draw a topic +$z_{nd}$ for each word position $n$ in the document $d$.

We can think of $z_{nd}$ as defining a distribution over words at position $n$ +in document $d$. The problem with viewing it directly as a distribution is +that in such view we ignore the topics and don’t share word probabilities among +different positions from the same topic. That’s why we keep a separate set of +random variables $\boldsymbol\beta_k$, which define the distribution over +words in a topic $k$, and $z_{nd}$ simply acts as an index into one of those +$\boldsymbol\beta_k$.

This allows us to draw the topic of each position in each document independently, +while sharing the probabilities of words within the same topic between document.

Just to summarize, we have the following count constants:

$D$ is the number of documents.
$N_d$ is the number of words in document $d$.
$K$ is the number of topics.

and the following random variables:

One hyperparameter $\boldsymbol\alpha$ for the prior distribution on topics +for each document.
One hyperparameter $\boldsymbol\gamma$ for the prior distribution on words in +each topic.
A set of $D$ (number of documents) parameters $\boldsymbol\theta_{1:D}$ for our +per-document topic distribution.
A set of $K$ parameters $\boldsymbol\beta_{1:K}$ for our per-topic word distribution.
A set of $N \times D$ random variables $z_{nd}$ for each position in each +document signifying the topic of a word at that position.
A set of $N \times D$ random variables $w_{nd}$ representing the actual word +at a given position in each document, drawn from the $z_{nd}$-th topic.

Inference in LDA

Similarly to the Mixture of Categoricals, if we wanted to compute the posterior +over our parameters $\boldsymbol\beta_{1:K}$ and $\boldsymbol\theta_{1:D}$ given we +observed words $z_{nd}$, we’d need to marginalize out the latent variables +$z_{nd}$. Let us write out the posterior first:

$$ +p(\boldsymbol\beta, \boldsymbol\theta, \boldsymbol z | \boldsymbol w, \alpha, \gamma) = +\frac{p(\boldsymbol\beta, \boldsymbol\theta, \boldsymbol z, \boldsymbol w | \alpha, \gamma)}{ +p(\boldsymbol w | \alpha, \gamma) +} +$$

where

$$ +p(\boldsymbol\beta, \boldsymbol\theta, \boldsymbol z, \boldsymbol w | \gamma, \alpha) += \prod_{k=1}^K p(\boldsymbol\beta_k | \gamma) \prod_{d=1}^D \left( +p(\boldsymbol\theta_d | \alpha) +\prod_{n=1}^{N_d} \left( p(z_{nd} | \boldsymbol\theta_d) p(w_{nd} | \boldsymbol\beta, z_{nd}) \right) +\right) +$$

and the normalization constant $p(\boldsymbol w | \alpha, \gamma)$ written out as

$$ +p(\boldsymbol w | \alpha, \gamma) = \int \int \sum_{z_{id}} \prod_{d=1}^D \prod_{k=1}^K \prod_{n=1}^{N_d} +p(z_{nd} | \boldsymbol\theta_d) p(\boldsymbol\theta_d | \alpha) p(w_{nd} | \boldsymbol\beta, z_{nd}) +p(\boldsymbol\beta_k | \gamma)\ d\boldsymbol\beta_k\ d\boldsymbol\theta_d +$$

is intractable, since we’d need to marginalize out the latent variables +$z_{nd}$. If every document had $N$ words, this means $K^N$ configurations per +document.

Although the posterior is intractable for exact inference, we can use many approximate +inference algorithms, e.g. Markov-Chain Monte Carlo and variational inference. In the next +article we’ll see how to apply Gibbs sampling to LDA.

References

Posterior Predictive Distribution for the Dirichlet-Categorical Model (Bag of Words)

Tue, 04 Dec 2018 14:22:58 +0100

In the previous article we derived a maximum likelihood estimate (MLE) for the +parameters of a Multinomial distribution. This time +we’re going to compute the full posterior of the Dirichlet-Categorical model as +well as derive the posterior predictive distribution. This will close our +exploration of the Bag of Words model.

Likelihood

Similarly as in the previous article, our likelihood will be defined by a +Multinomial distribution, that is

$$ +p(D|\boldsymbol\pi) \propto \prod_{i+1}^m \pi_i^{x_i}. +$$

Since the Dirichlet distribution is a conjugate prior to the Multinomial, we +can omit the normalization constants as we will be able to infer them +afterwards from the unnormalized posterior parameters. Knowing that the +posterior is again a Dirichlet distribution saves us a lot of tedious work.

Prior

Much like the model name would suggest, our prior will be the Dirichlet distribution, +which defines an $m-dimensional$ probability simplex over the Multinomial’s parameters. +The prior has the form

$$ +p(\boldsymbol\pi|\boldsymbol\alpha) = \frac{1}{B(\boldsymbol\alpha)} +\prod_{i=1}^m \pi_i^{\alpha_i - 1}. +$$

Posterior

Multiplying the likelihood by the prior will directly give us the shape of the posterior +because of the conjugacy. We don’t have to care about the normalizing constant. As a result, we obtain

$$ +\begin{align} +p(\boldsymbol\pi | D) &\propto p(D|\boldsymbol\pi) p(\boldsymbol\pi | \boldsymbol\alpha) \\\\ +&= \prod_{i=1}^m \pi_i^{x_i} \prod_{i=1}^m \pi_i^{\alpha_i - 1} \\\\ +&\propto \prod_{i=1}^m \pi_i^{\alpha_i + x_i - 1} \\\\ +&\propto Dir(\boldsymbol\pi | \alpha_1 + x_1, \alpha_2 + x_2, \ldots, \alpha_m + x_m) +\end{align} +$$

We can write this more succintly as $Dir(\boldsymbol\pi | \boldsymbol\alpha

\boldsymbol x)$ where $x$ is the vector of counts of the observed data $D$.

MAP estimate of the parameters

Since we have our posterior, we can take a small detour and compute the +maximum-aposteriori (MAP) estimate of the parameters, which is simply the mode +of the posterior (its maximum). We can do this similarly to the previous +article and use lagrange multipliers to enforce the constraint that $\sum_{i=1}^m

+
\pi_i = 1$. Since the Dirichlet distribution is again of the exponential family, +we differentiate the log posterior, which in turn is the log likelihood plus the log prior
+

$$ +\log p(\boldsymbol\pi | D) \propto \log p(D | \boldsymbol\pi) + \log p(\boldsymbol\pi | \boldsymbol\alpha). +$$

The lagrangian than has the following form

$$ +L(\boldsymbol\pi, \lambda) = \sum_{i=1}^m x_i \log \pi_i + \sum_{i=1}^m +(\alpha_i - 1) \log \pi_i + \lambda \left( 1 - \sum_{i=1}^m \pi_i \right). +$$

Same as before, we differentiate the lagrangian with respect to $\boldsymbol\pi_i$

$$ +\frac{\partial}{\partial\pi_i} L(\boldsymbol\pi, \lambda) = +\frac{x_i}{\pi_i} + \frac{\alpha_i - 1}{\pi_i} - \lambda = \frac{x_i + \alpha_i

1}{\pi_i} - \lambda +$$

and set it equal to zero

$$ +\begin{align} +0 &= \frac{x_i + \alpha_i - 1}{\pi_i} - \lambda \\\\ +\lambda &= \frac{x_i + \alpha_i - 1}{\pi_i} \\\\ +\pi_i &= \frac{x_i + \alpha_i - 1}{\lambda}. +\end{align} +$$

Finally, we can apply the same trick as before and solve for $\lambda$

$$ +\begin{align} +\pi_i &= \frac{x_i + \alpha_i - 1}{\lambda} \\\\ +\sum_{i=1}^m \pi_i &= \sum_{i=1}^m \frac{x_i + \alpha_i - 1}{\lambda} \\\\ +1 &= \sum_{i=1}^m \frac{x_i + \alpha_i - 1}{\lambda} \\\\ +\lambda &= \sum_{i=1}^m \left( x_i + \alpha_i - 1 \right) \\\\ +\lambda &= n - m + \sum_{i=1}^m \alpha_i. +\end{align} +$$

We can plug this back in to get the MAP estimate

$$ +\pi_i = \frac{x_i + \alpha_i - 1}{n + \left(\sum_{i=1}^m \alpha_i \right) - m}. +$$

Comparing this with the MLE estimate, which was

$$ +\pi_i = \frac{x_i}{n} +$$

we can see the concentration parameter $\boldsymbol\alpha$ affects the +probability. If we were to set a uniform prior with $\alpha_i=1$, we would +recover the original MLE estimate.

Posterior predictive

The posterior predictive distribution give us a distribution over the possible +outcomes while taking into account our uncertainty in the parameters given by +the posterior distribution. For a general model with an outcome $X$ and a parameter +vector $\boldsymbol\theta$ the posterior predictive is given by the following

$$ +p(X|D) = \int p(X | \boldsymbol\theta, D) p(\boldsymbol\theta | D)\ d\boldsymbol\theta +$$

Before we can integrate this, let us introduce a small +trick. For any +$\boldsymbol\theta = (\theta_1,\ldots,\theta_m)$ let us define +$\theta_{\neg j} = (\theta_1, \ldots, \theta_{j-1}, \theta_{j+1}, +\ldots, \theta_m)$, that is all $\theta_i$ except for $\theta_j$. +Using this we can write a marginal $p(\theta_j)$ as

$$ +\int p(\theta_j, \theta_{\neg j})\ d \theta_{\neg j} = p(\theta_j) +$$

The posterior predictive

$$ +p(X = j | \boldsymbol\theta) = \int p(X | \boldsymbol\theta) p(\boldsymbol\theta)\ d\theta +$$

can then be re-written using this trick as a double integral

$$ +\int_{\theta_j} \int_{\theta_{\neg j}} p(X = j | \boldsymbol\theta) +p(\boldsymbol\theta)\ d\theta_{\neg j}\ d\theta_j. +$$

Posterior predictive for single trival Dirichlet-Categorical

If we’re considering a single-trial multinomial (multinoulli) we have $p(X += j | \boldsymbol\pi) = \pi_j$, which is independent of +$\pi_{\neg j}$, simplifying the above expression

$$ +\int_{\pi_j} \pi_j \int_{\pi_{\neg j}} p(\boldsymbol\pi)
+d\pi_{\neg j}\ d\pi_j. +$$

Now applying the marginalization trick we get $\int_{\pi_{\neg j}} +p(\pi)\ d\pi_{\neg j} = p(\pi_j)$ and our posterior has the +form

$$ +\int_{\pi_j} \pi_j p(\pi_j)\ d\pi_j. +$$

Looking more closely at the formula, we can see this is an expectation of +$\pi_j$ under the posterior, that is

$$ +\int_{\pi_j} \pi_j p(\pi_j | D)\ d\pi_j = E[\pi_j | +D] = \frac{\alpha_j + x_j}{\sum_{i=1}^m \left( \alpha_i + x_i +\right)} = \frac{\alpha_j + x_j}{\alpha_0 + N} +$$

where $\alpha_0 = \sum_{i=1}^m \alpha_i$ and $N = \sum_{i=1}^m x_i$. +Repeating the result one more time for clarity, the posterior predictive +for a single trial Multinomial (Multinoulli) is given by

$$ +p(X=j | D) = \frac{\alpha_j + x_j}{\alpha_0 + N} +$$

Posterior predictive for a general multi-trial Dirichlet-Multinomial

Generalizing the posterior predictive to a Dirichlet-Multinomial model with +multiple trials is going to be a little bit more work. Let us begin by writing +the posterior predictive in its full form (note we drop the conditioning on $D$ +in the likelihood for brevity, and because it is not needed). To avoid notation +clashes, let us replace the posterior $\boldsymbol\alpha + \boldsymbol x$ by +$\boldsymbol \alpha’$, so we’ll write $Dir(\boldsymbol\alpha’)$ and $\alpha_i’$ +in place of $Dir(\boldsymbol\alpha + \boldsymbol x)$ and $\alpha_i + x_i$.

$$ +\begin{align} +p(X|D) &= \int p(X | \boldsymbol\pi) p(\boldsymbol\pi | D)\ d\boldsymbol\pi \\\\ +&= \int Mult(X | \boldsymbol\pi) Dir(\boldsymbol\alpha’) \ d\boldsymbol\pi \\\\ +&= \int \left(\binom{n!}{x_1! \ldots x_m!} \prod_{i=1}^m \pi_i^{x_i} \right) +\left(\frac{1}{B(\boldsymbol\alpha + \boldsymbol x)} \prod_{i=1}^m \pi_i^{\alpha_i’ - 1} \right) \ d\boldsymbol\pi \\\\ +&= \binom{n!}{x_1! \ldots x_m!} \frac{1}{B(\boldsymbol\alpha’)} +\int \prod_{i=1}^m \pi_i^{x_i} \prod_{i=1}^m \pi_i^{\alpha_i’ - 1} \ d\boldsymbol\pi \\\\ +&= \binom{n!}{x_1! \ldots x_m!} \frac{1}{B(\boldsymbol\alpha’)} +\int \prod_{i=1}^m \pi_i^{x_i + \alpha_i’ - 1} \ d\boldsymbol\pi \\\\ +&= \binom{n!}{x_1! \ldots x_m!} \frac{1}{B(\boldsymbol\alpha’)} B(\boldsymbol\alpha’ + \boldsymbol x) \\\\ +\end{align} +$$

where in the last equality we made use of knowing that the integral of an +unnormalized Dirichlet distribution is $B(\boldsymbol\alpha)$. Let us repeat +the definition of $B(\boldsymbol\alpha)$ again, that is

$$ +B(\boldsymbol\alpha) = \frac{\prod_{i=1}^m \Gamma(\alpha_i)}{\Gamma(\sum_{i=1}^m \alpha_i)} +$$

and plugging this back into the formula we computed

+ + + + + + + + + + +

$$ +\begin{align} +p(X|D) &= \binom{n!}{x_1! \ldots x_m!} \frac{1}{B(\boldsymbol\alpha’)} B(\boldsymbol\alpha’ + \boldsymbol x)\\\\ +&= \frac{n!}{x_1! \ldots x_m!} \frac{\Gamma(\sum_{i=1}^m \alpha_i’)}{\prod_{i=1}^m \Gamma(\alpha_i’)} +\frac{\prod_{i=1}^m \Gamma(\alpha_i’ + x_i)}{\Gamma(\sum_{i=1}^m \alpha_i’ + x_i)} \\\\ +\end{align} +$$

To move forward, we need to introduce a more general form for the multinomial distribution +which allows for non-integer counts. All it comes down is basically replacing factorials with +the gamma function, that is instead of

$$ +p(\boldsymbol x | \boldsymbol\pi, n) = \binom{n!}{x_1!\ldots x_m!} \prod_{i=1}^m \pi_i^{x_i} +$$

we write

$$ +p(\boldsymbol x | \boldsymbol\pi, n) = \frac{\Gamma(\sum_{i=1}^m x_i + +1)}{\prod_{i=1}^m \Gamma(x_i + 1)} \prod_{i=1}^m \pi_i^{x_i}. +$$

Since only the normalizing constant changed, we can plug it back into our posterior predictive formula

$$ +\begin{align} +p(X|D) &= \frac{\Gamma(\sum_{i=1}^m x_i + 1)}{\prod_{i=1}^m \Gamma(x_i + 1)} +\frac{\Gamma(\sum_{i=1}^m \alpha_i’)}{\prod_{i=1}^m \Gamma(\alpha_i’)} +\frac{\prod_{i=1}^m \Gamma(\alpha_i’ + x_i)}{\Gamma(\sum_{i=1}^m \alpha_i’ + x_i)} \\\\ +\end{align} +$$

which although ugly, it is the posterior predictive distribution in closed form :)

References

Maximum Likelihood for the Multinomial Distribution (Bag of Words)

Mon, 03 Dec 2018 23:52:59 +0100

In this short article we’ll derive the maximum likelihood estimate (MLE) of the +parameters of a Multinomial distribution. If you need a refresher on the +Multinomial distribution, check out the previous article.

Let us begin by repeating the definition of a Multinomial random variable. +Consider the bag of words model where we’re counting the nubmer of words in a +document, where the words are generated from a fixed dictionary. The +probability mass function (PMF) is defined as

$$ +p(\boldsymbol x | \boldsymbol \pi, n) = \binom{n!}{x_1! x_2! \ldots x_m!} +\prod_{i=1}^m \pi_i^{x_i} = n! \prod_{i=1}^m \frac{\pi_i^{x_i}}{x_i!} +$$

where $\pi_i$ is the probability of $i-th$ word, $x_i$ is the nubmer of +occurences of that word, $m$ is the number of words in the dictionary, and +$n$ is the total number of occurences of all words.

Since the Multinomial distribution comes from the exponential family, we know +computing the log-likelihood will give us a simpler expression, and since +$\log$ is concave computing the MLE on the log-likelihood will be equivalent +as computing it on the original likelihood function.

Now taking the log-likelihood

$$ +\begin{align} +\log L(\boldsymbol \pi) &= \log n! \left( \prod_{i=1}^m \frac{\pi_i^{x_i}}{x_i!} \right) \\\\ +&= \log n! + \sum_{i=1}^m x_i \log \pi_i - \sum_{i=1}^m \log x_i!. +\end{align} +$$

Before we can differentiate the log-likelihood to find the maximum, we need to introduce +the constraint that all probabilities $\pi_i$ sum up to $1$, that is

$$ +\sum_{i=1}^m \pi_i = 1. +$$

The lagrangian with the constraint than has the following form

$$ +\mathcal{L}(\boldsymbol \pi, \lambda) = \log L(\boldsymbol \pi) + \lambda (1 - \sum_{i=1}^m \pi_i). +$$

To find the maximum, we differentiate the lagrangian w.r.t. $\pi_i$ as follows

$$ +\begin{align} +\frac{\partial}{\partial\pi_i} \mathcal{L}(\boldsymbol\pi, \lambda) &= +\frac{\partial}{\partial\pi_i}\log L(\boldsymbol \pi) + \frac{\partial}{\partial\pi_i} \lambda (1 - \sum_{i=1}^m \pi_i) \\\\ +&= \frac{\partial}{\partial\pi_i}\log L(\boldsymbol \pi) - \lambda \\\\ +&= \frac{\partial}{\partial\pi_i} \left(\log n! + \sum_{i=1}^m x_i \log \pi_i - \sum_{i=1}^m \log x_i! \right) - \lambda \\\\ +&= \frac{x_i}{\pi_i} - \lambda. +\end{align} +$$

Finally, setting the lagrangian equal to zero allows us to compute the extremum as

$$ +\pi_i = \frac{x_i}{\lambda}. +$$

To solve for $\lambda$, we sum both sides and make use of our initial constraint

$$ +\begin{align} +\pi_i &= \frac{x_i}{\lambda} \\\\ +\sum_{i=1}^m \pi_i &= \sum_{i=1}^m \frac{x_i}{\lambda} \\\\ +1 &= \frac{1}{\lambda }\sum_{i=1}^m x_i \\\\ +1 &= \frac{1}{\lambda} n \\\\ +\lambda &= n \\\\ +\end{align} +$$

giving us the final form of the MLE for $\pi_i$, that is

$$ +\pi_i = \frac{x_i}{n} +$$

which is what we would expect. The MLE for a word is exactly its frequency in the document.

Dirichlet-Categorical Model

Sun, 02 Dec 2018 00:06:08 +0100

In the previous article we looked at the Beta-Bernoulli model. This time we’ll extend it to a model with +multiple possible outcomes. We’ll also take a look at the Dirichlet, +Categorical and Multinomial distributions.

After this, we’ll be quite close to implementing interesting models such as the +Latent Dirichlet Allocation (LDA). But for now, we have to understand the +basics first.

Multinomial coefficients

Before we can dive into the dirichlet-categorical model we have to briefly look +at the multinomial coefficient, which is the generalization of a binomial +coefficient. First, here’s a definition of the binomial coefficient

$$ +\binom{n}{k} = \frac{n!}{k! (n - k)!} +$$

which represents the number of ways we can choose $k$ items out of $n$ total.

We can generalize this to more than two types of items using the multinomial +coefficient defined as

$$ +\binom{n}{k_1, k_2, \ldots, k_m} = \frac{n!}{k_1! k_2! \ldots k_m!}. +$$

which represents the number of ways we can split $n$ items into $m$ groups, +with $k_1$ items in the first group, $k_2$ items in the second group, and so +on.

Categorical distribution

Now that we are comfortable with multinomial coefficients, let us continue with +the generalization of the Bernoulli distribution, that is the Categorical +distribution, denoted as $Cat(\boldsymbol{\pi})$, where $\boldsymbol\pi$ is a +vector of probabilities for each possible outcome. The probability mass +function (PMF) is simply

$$ +p(x|\boldsymbol \pi) = \prod_{i=1}^k \pi_i^{I[x = i]} +$$

where $I[x = i]$ is an indicator variable which evaluates to $1$ if $x=i$, or +to $0$ otherwise. Note that we require $\sum_{i=1}^k \pi_i = 1$.

We can also re-formulate this for the case of one-of-K encoding, where only one +of the outcomes is $1$, and the remaining elements equal $0$. Then the +distribution becomes

$$ +p(x|\boldsymbol\pi) = \prod_{i=1}^k \pi_i^{x_i}. +$$

An example of this would be a single roll of a dice, where only one of the +outcomes is possible, but each might have a different probability (unfair +dice).

Multinomial distribution

Having understood the Categorical distribution, we can now move to the +generalization of the Binomial distribution to multiple outcomes, that is the +Multinomial distribution. An easy way to think of it is $n$ rolls of a +$k$-sided dice.

When $n = 1$ and $k = 2$ we have a Bernoulli distribution.
When $n = 1$ and $k > 2$ we have a Categorical distribution.
When $n > 1$ and $k = 2$ we have a Binomial distribution.
And finally, when $n > 1$ and $k > 2$ we have a Multinomial distribution.

Of course we can simply always use the Multinomial distribution as it is the +most general. The PMF in the one-of-K case is then simply

$$ +p(\boldsymbol{x} | \boldsymbol{\pi},n) = \binom{n!}{x_1!x_2! \ldots x_k!} \prod_{i=1}^k \pi_i^{x_i} +$$

In this case $\boldsymbol{x} = (x_1, \ldots, x_k)$ represent the number of +times each outcome was observed, while again $\boldsymbol{\pi} = (\pi_1, +\ldots, \pi_k)$ represent the probabilities of each outcome.

An example of the multinomial distribution is the Bag of Words model, which +describes the number of occurences of each word in a dataset. There are $k$ +possible words in a dictionary and the document consists of $n$ words in total.

Dirichlet distribution

Lastly, let us consider the Dirichlet distribution, which is a generalization +of the Beta distribution to more than two outcomes. The Dirichlet distribution +is to the Categorical/Mutlinomial what the Beta is to the Bernoulli/Binomial.

A random vector $\boldsymbol{\pi} = (\pi_1, \ldots, \pi_k)$ with $\sum_{i=1}^k \pi_i = +1$ and $\pi_i \in (0; 1)$ has a Dirichlet distribution with a PMF

$$ +Dir(\boldsymbol{\pi} | \alpha_1, \ldots, \alpha_m) = \frac{\Gamma(\sum_{i=1}^k +\alpha_i)}{\prod_{i=1}^k \Gamma(\alpha_i)} \prod_{i=1}^k \pi_i^{\alpha_i - +1}. +$$

Just like we did with the Beta distribution, we can simplify things by naming +normalization constant, as it can be computed in closed form from the parameters, +that is

$$ +Dir(\boldsymbol{\pi} | \alpha_1, \ldots, \alpha_m) = +\frac{1}{B(\boldsymbol{\alpha})} \prod_{i=1}^k \pi_i^{\alpha_i - 1} +$$

where

$$ +B(\boldsymbol{\alpha}) = \frac{\prod_{i=1}^k \Gamma(\alpha_i)}{\Gamma(\sum_{i=1}^k \alpha_i)}. +$$

Note that for shorthand we will also write $\boldsymbol\alpha = (\alpha_1, +\ldots, \alpha_k)$, giving us a shorter notation when writing +$Dir(\boldsymbol\pi | \boldsymbol\alpha)$.

Dirichlet-Categorical Model

Similarly in the previous article about the Beta-Bernoulli model we will now introduce the +Dirichlet-Categorical model. Since everything is analogous we won’t go into +that much detail. The Dirichlet distribution is a conjugate prior to the +Categorical and Multinomial distributions, which means if we set our prior to +Dirichlet and our likelihood to Categorical or Mutlinomial, the resulting +distribution will again be a Dirichlet distribution.

We can observe this easily by just multiplying out the probability mass +functions for $Cat(\boldsymbol x | \boldsymbol \pi)$ and +$Dir(\boldsymbol\pi|\boldsymbol\alpha)$, that is

$$ +Cat(\boldsymbol x | \boldsymbol \pi) Dir(\boldsymbol\pi|\boldsymbol\alpha) +\propto \prod_{i=1}^k \pi_i^{x_i} \prod_{i=1}^k \pi_i^{\alpha_i - 1}. +$$

Since only one of the $x_i$ in the Categorical distribution can be $1$ and the +rest are $0$, say $x_j =1 $, then this will get multiplied by the respective +$\pi_j$ in the Dirichlet distribution and we can immediately see that +$\alpha_j$ will be increased by one, giving us a new Dirichlet distribution +with a parameter $(\alpha_1, \ldots, \alpha_j + 1, \ldots, \alpha_k)$.

Beta Distribution and the Beta-Bernoulli Model

Sat, 01 Dec 2018 19:08:14 +0100

The Beta distribution is a parametric distribution defined on the interval $[0; +1]$ with two positive shape parameters, denoted $\alpha$ and $\beta$. Probably +the most common use case is using Beta as a distribution over probabilities, as +in the case of the parameter of a Bernoulli random variable. Even more +importantly, the Beta distribution is a conjugate prior for the Bernoulli, +binomial, negative binomial and geometric distributions.

The PDF of the Beta distribution, for $x \in [0; 1]$ is defined as

$$ +p(x | \alpha, \beta) = \frac{1}{B(\alpha, \beta)} x^{\alpha - 1} (1 - x)^{\beta - 1} +$$

where $B(\alpha, \beta)$ is the normalizing constant which can be directly +computed from the parameters using the gamma function (denoted $\Gamma$ and +defined via an integral $\Gamma(z) = \int_0^\infty x^{z-1} e^{-x}\ dx$) as +follows

$$ +B(\alpha, \beta) = \frac{\Gamma(\alpha) \Gamma(\beta)}{\Gamma(\alpha + \beta)}. +$$

This gives us the complete form of the PDF

$$ +Beta(x | \alpha, \beta) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha) +\Gamma(\beta)} x^{\alpha - 1} (1 - x)^{\beta - 1}. +$$

Because of the conjugacy, we rarely have to worry about the normalizing constant +and can simply compute it in closed form.

As a small aside, let us compute the expectation of a Beta random variable $X +\sim Beta(\alpha, \beta)$. Note that the support of the Beta distribution is +$[0; 1]$, which means we’re only integrating over that interval.

$$ +\begin{align} +\mu = E[X] &= \int_0^1 x p(x | \alpha, \beta)\ dx \\\\ +&= \int_0^1 x \frac{x^{\alpha - 1} (1 - x)^{\beta - 1}}{B(\alpha, \beta)}\ dx \\\\ +&= \frac{1}{B(\alpha, \beta)}\int_0^1 x^{\alpha} (1 - x)^{\beta - 1}\ dx \\\\ +\end{align} +$$

Here we make use of a simple trick. Since $B(\alpha, \beta)$ is the normalizing +constant, it must hold that the integral over an unnormalized $Beta(\alpha, \beta)$ +distribution is exactly $B(\alpha, \beta)$, that is

$$ +\int_0^1 x^{\alpha - 1} (1 - x)^{\beta - 1}\ dx = B(\alpha, \beta). +$$

If we look at the integral we got in the previous expression, it is very similar, +except the $\alpha$ instead of $\alpha - 1$. But that is ok, it simply corresponds to +$B(\alpha + 1, \beta)$. We can plug this back in and get

$$ +\begin{align} +\mu &= \frac{B(\alpha + 1, \beta)}{B(\alpha, \beta)} \\\\ +&= \frac{\Gamma(\alpha + 1)\Gamma(\beta)}{\Gamma(\alpha + 1 + \beta)} +\frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \\\\ +&= \frac{\alpha \Gamma(\alpha)\Gamma(\beta)}{(\alpha + \beta)\Gamma(\alpha + +\beta)} \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \\\\ +&= \frac{\alpha}{\alpha + \beta} \\\\ +\end{align} +$$

using the identity $\Gamma(x + 1) = x \Gamma(x)$.

Beta-Bernoulli model

Let us now show a simple example where we make use of the conjugacy between +Beta and Bernoulli distributions.

Consider a random variable representing the outcome of a single coin toss, +which has a Bernoulli distribution with a parameter $\theta$ (probability of heads). +Before we observe the coin toss, we might have some prior belief about the +fairness of the coin. Let us set the prior belief as if we’ve seen 1 head and 1 +tail before tossing the coin, that is $Beta(1, 1)$.

Because Bayesian inference models uncertainty directly, this does not mean that +we believe the coin is fair, even though the maximum likelihood estimate of +$\theta$ for these two coin tosses would be $0.5$. We are however interested in +computing the full posterior over $\theta$, that is $p(\theta | D)$ where $D$ +is our observed data. Using Bayes theorem we get

$$ +p(\theta | D) = \frac{p(\theta | \alpha, \beta) p(D | \theta)}{p(D)}. +$$

Now knowing that the Beta distribution is a conjugate prior for the Bernoulli +distribution, and given that our prior is Beta and our likelihood is Bernoulli, +we know that our posterior must be a Beta distribution as well. We can thus +omit the normalizing constant $p(D)$ since we can infer it from the computed +parameters from multiplying the prior by the likelihood.

Let’s say we toss the coin once and observe heads. We can write the likelihood

$$ +p(D | \theta) = \theta +$$

and putting this together with the prior

$$ +p(\theta | \alpha, \beta) \propto \theta^{\alpha - 1} (1 - \theta)^{\beta - 1} +$$

we can compute the posterior

$$ +p(\theta | D) \propto \theta\theta^{\alpha - 1} (1 - \theta)^{\beta - 1} = \theta^{(\alpha - 1) + 1} (1 - \theta)^{\beta - 1} \propto Beta(\theta | \alpha + 1, \beta). +$$

As you can see, multiplying the likelihood and the prior gives again gives a +distribution which is exactly the same shape as a Beta distribution. We can thus +infer back the normalizing constant to be $B(\alpha + 1, \beta)$ and write our full +posterior in closed form

$$ +p(\theta | D) = \frac{1}{B(\alpha + 1, \beta)} \theta^{\alpha} (1 - \theta)^{\beta - 1} +$$

If we observed tails, the likelihood would be $p(D | \theta) = 1 - \theta$ +since $\theta$ is the probability of heads. Plugging this back into the +previous formula we can easily see that the resulting distribution would be +$Beta(\alpha, \beta + 1)$.

The Beta distribution basically acts as a counter. With every newly observed +coin toss it gets added to our existing prior belief to compute the posterior, +which then can become a prior for the next coin toss, but with our belief updated. +This is a simple example of how Bayesian models can be updated on-line as new data +comes in.

The Gaussian Distribution - Basic Properties

Sat, 01 Dec 2018 02:59:25 +0100

The Gaussian distribution has many interesting properties, many of which make +it useful in various different applications. Before moving further, let us just +define the univariate PDF with a mean $\mu$ and variance $\sigma^2$

$$ +\mathcal{N}(x | \mu, \sigma^2) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left( -\frac{(x - \mu)^2}{2 \sigma^2} \right). +$$

In the general multi-dimensional case, the mean becomes a mean vector, and the variance turns into +a $D \times D$ covariance matrix. The PDF then becomes

$$ +\mathcal{N}(\mathbf{x} | \mathbf{\mu}, \mathbf{\Sigma}) = \frac{1}{\sqrt{(2 \pi)^k det(\mathbf{\Sigma})}} +\exp \left( -\frac{1}{2} (\mathbf{x} - \mathbf{\mu})^T \mathbf{\Sigma}^{-1} (\mathbf{x} - \mathbf{\mu}) \right) +$$

where $det(\Sigma)$ is the determinant of the covariance matrix $\Sigma$. The +term in the exponent is called Mahalanobis distance and is useful to study in +more detail.

Affine property

The first property of the Gaussian states that if $X \sim \mathcal{N}(\mu, +\Sigma)$, then $Y = A X + b$ is also a Gaussian, specifically $Y \sim +\mathcal{N}(A \mu + b, A \Sigma A^T)$. We can prove this using the definition of +mean and covariance. The mean of $Y$ (denoted $\mu_Y$) can be derived simply +from the linearity of expectation, that is

$$ +\mu_Y = E[Y] = E[A X + b] = E[A X] + E[b] = A E[X] + b = A \mu + b. +$$

And now the covariance $\Sigma_Y$ we again substitute into the definition +of covariance and get

$$ +\begin{align} +\Sigma_Y &= E[(Y - \mu_Y) (Y - \mu_Y)^T] \\\\ +&= E[((A X + b) - (A \mu + b)) ((A X + b) - (A \mu + b))^T] \\\\ +&= E[(A(X - \mu)) (A (X - \mu))^T] \\\\ +&= E[A (X - \mu) (X - \mu)^T A^T] \\\\ +&= A E[(X - \mu) (X - \mu)^T] A^T \\\\ +&= A \Sigma A^T +\end{align} +$$

and thus $\Sigma_Y = A \Sigma A^T$, which gives the final result of

$$ +Y \sim \mathcal{N}(A \mu, A \Sigma A^T). +$$

Sampling from a Gaussian

We can immediately make use of the affine property to define how to sample from +a multivariate Gaussian. We’ll make use of Cholesky decomposition, which for +a positive-definite matrix $\Sigma$ returns a lower triangular matrix $L$, such +that

$$ +L L^T = \Sigma. +$$

This together with the affine property defined above gives us

$$ +\mathcal{N}(\mu, \Sigma) = \mu + L \mathcal{N}(0, I). +$$

Sampling from the former is thus equivalent to sampling from the latter. Since +$\mu$ and $L$ are constant factors with respect to sampling, we simply have to +figure out how to draw samples from $\mathcal{N}(0, I)$ and then do the affine +transform back to our original distribution.

Observe that since the covariance of $\mathcal{N}(0, I)$ is diagonal, the individual +values in the random vector are independent. Note that this property is special to the +Gaussian and is a little bit tricky, +but holds in our case, because in this case we’re inferring that individual random variables +which are jointly Gaussian but uncorrelated are independent.

Finally, because the variables are independent, we can sample them independently, +which can be done easily using the Box-Muller transform. +Once we obtain our $D$ independent samples, we simply multiply by $L$ and add $\mu$ +to obtain correlated samples from our original distribution.

Sum of two independent Gaussians is a Gaussian

If $X$ and $Y$ random variables with a Gaussian distributions, where $X \sim +\mathcal{N}(\mu_X, \sigma_X^2)$ and $X \sim \mathcal{N}(\mu_X, +\sigma_X^2)$, then

$$ +X + Y \sim \mathcal{N}(\mu_X + \mu_Y, \sigma_X^2 + \sigma_Y^2). +$$

This can be proven many different ways, the simplest of which is probably using +moment generating functions. With the moment generating function of a Gaussian +being

$$ +M_X(t) = \exp \left( t\mu + \frac{1}{2} \sigma^2 t^2 \right), +$$

and using the property of moment generating functions which says how to combine +two independent variables $X$ and $Y$, specifically

$$ +M_{X + Y}(t) = M_X(t) M_Y(t), +$$

we can simply plug in our moment generating function for the Gaussian and get +our result

$$ +\begin{align} +M_{X + Y}(t) &= M_X(t) M_Y(t) \\\\ +&= \exp \left( t\mu_X + \frac{1}{2} \sigma_X^2 t^2 \right) \exp \left( t\mu_Y + \frac{1}{2} \sigma_Y^2 t^2 \right) \\\\ +&= \exp \left( t(\mu_X + \mu_Y) + \frac{1}{2} t^2 (\sigma_X^2 + \sigma_Y^2) \right) +\end{align} +$$

Deriving the normalizing constant

We can compute the Gaussian integral using polar coordinates. Consider the zero mean unit variance case.

$$ +\begin{align} +\left( \int_{-\infty}^\infty e^{-x^2} dx \right)^2 &= +\int_{-\infty}^\infty e^{-x^2} dx \int_{-\infty}^\infty e^{-x^2} dx \\\\ +&= \int_{-\infty}^\infty e^{-x^2} dx \int_{-\infty}^\infty e^{-y^2} dy \qquad \text{rename $x$ to $y$}\\\\ +&= \int_{-\infty}^\infty \int_{-\infty}^\infty e^{-(x^2 + y^2)} dx\ dy +\end{align} +$$

And now comes an important trick, we’ll do a polar coordinate substitution, +since $e^{-(x^2 + y^2)} = e^{-r^2}$ in $R^2$.

$$ +\begin{align} +&= \int_{-\infty}^\infty \int_{-\infty}^\infty e^{-(x^2 + y^2)} dx\ dy\\\\ +&= \int_0^{2\pi} \int_0^\infty e^{-r^2} r\ dr\ d\theta \\\\ +&= 2\pi \int_0^\infty e^{-r^2} r\ dr \\\\ +\end{align} +$$

now substituting $s = -r^2$ and $ds = -2 r\ dr$, giving us

$$ +\begin{align} +&= 2\pi \int_0^\infty e^{-r^2} r\ dr \\\\ +&= 2\pi \int_0^\infty -\frac{1}{2} e^s\ ds \\\\ +&= \pi \int_0^\infty -e^s\ ds \qquad\text{flipping integration bounds} \\\\ +&= \pi \int_{-\infty}^0 e^s\ ds \\\\ +&= \pi (e^0 - e^{-\infty}) \\\\ +&= \pi +\end{align} +$$

Finally, combining this with the initial integral we get

$$ +\left( \int_{-\infty}^\infty e^{-x^2} dx \right)^2 = \pi +$$

and as a result

$$ +\int_{-\infty}^\infty e^{-x^2} dx = \sqrt{\pi}. +$$

Deriving the mean and standard deviation

Lastly, while not necessarily a property of the Gaussian, it is a useful +exercise to derive the mean and standard deviation from the PDF. Once again, +the PDF is

$$ +p(x | \mu, \sigma^2) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left( -\frac{(x - \mu)^2}{2 \sigma^2} \right) +$$

and the general formula for $E[X]$ is

$$ +E[X] = \int_{-\infty}^\infty x p(x)\ dx = \int_{-\infty}^\infty x \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left( -\frac{(x - \mu)^2}{2 \sigma^2} \right)\ dx. +$$

We can pull out the constant outside of the integral and substitute $u = x - +\mu$ and $du = dx$, giving us

$$ +\begin{align} +&= \frac{1}{\sqrt{2 \pi \sigma^2}} \int_{-\infty}^\infty (u + \mu) \exp \left( -\frac{u^2}{2 \sigma^2} \right)\ du \\\\ +&= \frac{1}{\sqrt{2 \pi \sigma^2}} \left( \left( \int_{-\infty}^\infty u \exp \left( -\frac{u^2}{2 \sigma^2} \right)\ du \right) + +\mu \left( \int_{-\infty}^\infty \exp \left( -\frac{u^2}{2 \sigma^2} \right)\ du \right) \right) \\\\ +&= \frac{1}{\sqrt{2 \pi \sigma^2}} \left( \int_{-\infty}^\infty u \exp \left( -\frac{u^2}{2 \sigma^2} \right)\ du \right) + \mu \\\\ +\end{align} +$$

Here we note that the function being integrated is odd, which means the +integral adds up to $0$, and we’re left with only $\mu$, that is

$$ +E[X] = \mu +$$

which is what we wanted to prove.

Now for the variance, which is defined as

$$ +var(X) = E[(X - \mu)^2] +$$

which written again as an integral gives us

$$ +var(X) = \int_{-\infty}^\infty (x - \mu)^2 p(x)\ dx = \int_{-\infty}^\infty (x - \mu)^2 \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left( -\frac{(x - \mu)^2}{2 \sigma^2} \right)\ dx. +$$

again pulling out the constant and substituting $u = x - \mu$ and $du = dx$ we get

$$ +\begin{align} +var(X) &= \frac{1}{\sqrt{2 \pi \sigma^2}} \int_{-\infty}^\infty u^2 \exp \left( -\frac{u^2}{2 \sigma^2} \right)\ du. +\end{align} +$$

Integrating by parts using the $\int u\ v’ = u\ v - \int v\ u’$ where we set

$$ +\begin{align} +u &= y \\\\ +u’ &= 1 \\\\ +v’ &= y \cdot e^{-y^2 / 2\sigma^2}. +\end{align} +$$

To get $v$ we have to compute the integral of $v’$, which we can easily do substituting $u = -\frac{y^2}{2\sigma^2}$ +and $du = -\frac{y}{\sigma^2} dy$, giving us

$$ +\begin{align} +\int y \cdot e^{-y^2 / 2\sigma^2}\ dy &= -\int \sigma^2 e^u\ du \\\\ +&= -\sigma^2 e^u \\\\ +&= -\sigma^2 e^{-\frac{y^2}{2\sigma^2}}. +\end{align} +$$

Now finishing our integration by parts we can write out the final formula

$$ +\begin{align} +\int u v’ &= u\ v - \int v\ u’ \\\\\ +&= \frac{1}{2 \pi \sigma^2} \left( \left[y (-\sigma^2) e^{-\frac{y^2}{2\sigma^2}}\right]_{-\infty}^\infty - \int_{-\infty}^\infty (-s^2) e^{-\frac{y^2}{2s^2}} \ dy \right) \\\\ +&= 0 + \sigma^2 \cdot 1 = \sigma^2. +\end{align} +$$

That is, $var(X) = \sigma^2$.

Graphical Models: D-Separation

Thu, 29 Nov 2018 20:51:24 +0100

$$ +\newcommand{\bigci}{\perp\mkern-10mu\perp} +$$

This article is a brief overview of conditional independence in graphical models, and the related d-separation. Let us begin with a definition.

For three random variables $X$, $Y$ and $Z$, we say $X$ is conditionally independent of $Y$ given $Z$ iff

$$ +p(X, Y | Z) = p(X | Z) p(Y | Z). +$$

We can use a shorthand notation

$$ +X \bigci Y | Z +$$

Before we can define d-separation, let us first show three different types of graphs. Consider the same three variables as before, we’ll be interested in conditional independence based on whether we observe $Z$.

Tail-tail

The first case is called the tail-tail.

+ +

We can factor the joint distribution to get

$$ +p(X, Y, Z) = p(X | Z) p(Y | Z) p(Z) +$$

and conditioning on the value of $Z$ we get (using the Bayes’ theorem)

$$ +p(X, Y | Z) = \frac{p(X, Y, Z)}{p(Z)} = \frac{p(X | Z) p(Y | Z) p(Z)}{p(Z)} = p(X | Z) p(Y | Z). +$$

From this we can immediately see that conditioning on $Z$ in the tail-tail case makes $X$ and $Y$ independent, that is $X \bigci Y | Z$.

Head-tail

The second case is called the head-tail and looks as the following.

+ +

We can again write the joint distribution for the graph

$$ +p(X, Y, Z) = p(X) p(Z | X) p(Y | Z) +$$

and again conditioning on $Z$ we get (using rules of conditional probability)

$$ +\begin{align} +p(X, Y | Z) &= \frac{p(X, Y, Z)}{p(Z)} \\\\ +&= \frac{p(X) p(Z | X) p(Y | Z)}{p(Z)} \\\\ +&= \frac{p(X, Z) p(Y | Z)}{p(Z)} \\\\ +&= \frac{p(X | Z) p(Z) p(Y | Z)}{p(Z)} \\\\ +&= p(X | Z) p(Y | Z) +\end{align} +$$

and so again, $X$ and $Y$ are conditionally independent given $Z$, that is $X \bigci Y | Z$.

Checking marginal independence

For completeness, we can also check if $X$ and $Y$ are marginally independent, which they shouldn’t be, since we just showed they’re conditionally independent.

$$ +p(X, Y, Z) = p(X) p(Z | X) p(Y | Z) +$$

which gives us the following when marginalizing over $Z$

$$ +p(X, Y) = \sum_Z p(X, Y, Z) = p(X) \sum_Z p(Z | X) p(Y | Z) = p(X) \sum_Z p(Y, Z | X) = p(X) p(Y | X) +$$

from which we can immediately see it does not factorize into $p(X) p(Y)$ in the general case, and thus $X$ and $Y$ are not marginally independent.

Head-head

The last case is called the head-head and is a little bit tricky

+ +

We can again write out the joint distribution

$$ +p(X, Y, Z) = p(X) p(Y) p(Z | X, Y), +$$

but this does not immediately help us when we try to condition on $Z$, we would want

$$ +p(X, Y | Z) = \frac{p(X, Y, Z)}{p(Z)} \stackrel{?}{=} p(X|Z) p(Y|Z) +$$

which does not hold in general. For example, consider $X, Y \sim Bernoulli(0.5)$ and $Z = 1$ if $X = Y$, and $0$ otherwise. In this case if we know $Z$ and observe $X$, it immediately tells us the value of $Y$, hence $X$ and $Y$ are not conditionally independent given $Z$.

We can however do a little trick and write the $p(X, Y)$ as a marginalization over $Z$, that is

$$ +p(X, Y) = \sum_Z p(X, Y, Z) = \sum_Z p(X) p(Y) p(Z | X, Y) = p(X) p(Y) +$$

since $\sum_Z p(Z | X, Y) = 1$. As a result, in the head-head case we have marginal independence between $X$ and $Y$, that is $X \bigci Y$.

D-separation

Having shown the three cases, we can finally define d-separation. Let $G$ be a DAG, and let $A, B, C$ be disjoint subsets of vertices.

A path between two vertices is blocked if it passes through a vertex $v$, such that either:

the edges are head-tail or tail-tail, and $v \in C$, or
the edges are head-head, and $v \not \in C$, and neither are any of its descendants.

We say that $A$ and $B$ are d-separated by $C$ if all paths from a vertex of $A$ to a vertex of $B$ are blocked w.r.t. $C$. And now comes the important part, if $A$ and $B$ are d-separated by $C$, then $A \bigci B\ |\ C$.

Thig might all look very complicated, but this property of directed graphical models is actually extremely useful, and very easy to do quickly after seeing just a few examples.

Examples

To get a feel for d-separation, let us look at the following example ($B$ is observed).

+ +

We can immediately see that $A \bigci D | B$ since this is the head-tail case. We can also see that $A \not{\bigci} E | B$ (not conditionally independent), because while the path through $B$ is blocked, the path through $C$ is not.

+ + + + +

Variational Inference - Deriving ELBO

Sat, 24 Nov 2018 21:20:11 +0100

This post describes two approaches for deriving the Expected Lower Bound (ELBO) used in variational inference. Let us begin with a little bit of motivation.

Consider a probabilistic model where we are interested in maximizing the marginal likelihood $p(X)$ for which direct optimization is difficult, but optimizing complete-data likelihood $p(X, Z)$ is significantly easier.

In a bayesian setting, we condition on the data $X$ and compute the posterior distribution $p(Z | X)$ over the latent variables given our observed data. This may however require approximate inference. There are two general approaches, sampling using MCMC, and optimization using variational inference.

The main idea behind variational inference is to consider a family of densities $\mathcal(Q)$ over the latent variables, and use optimization to find $q(Z)$ that approximates our target posterior $p(Z | X)$. We measure this using the Kullback-Leiber divergence, that is

$$ +q^*(Z) = {\arg\min}_{q(Z) \in \mathcal{Q}} KL(q(Z)\ ||\ p(Z | X)). +$$

However, optimizing the KL divergence directly is not tractable, because it requires us to compute the log posterior $p(Z | X)$, specifically

$$ +KL(q(Z)\ ||\ p(Z | X)) = -\mathrm{E}_q \left[\log \frac{p(Z | X)}{q(Z)} \right]. +$$

We can however do a bit of equation shuffling (note we omit the explicit density in the expectation since all of them are taken w.r.t $q$)

$$ +\begin{aligned} +KL(q(Z)\ ||\ p(Z | X)) &= -\mathrm{E} \left[\log \frac{p(Z | X)}{q(Z)} \right] \\\\ +&= \mathrm{E} \left[\log \frac{q(Z)}{p(Z | X)} \right] \\\\ +&= \mathrm{E} \left[\log q(Z) \right] - \mathrm{E} \left[\log p(Z | X) \right] \\\\ +&= \mathrm{E} \left[\log q(Z) \right] - \mathrm{E} \left[\log p(Z, X) \right] + \mathrm{E} \left[ \log p(X) \right] \\\\ +&= \mathrm{E} \left[\log \frac{q(Z)}{p(Z, X)} \right] + \log p(X) \\\\ +&= -\mathrm{E} \left[\log \frac{p(Z, X)}{q(Z)} \right] + \log p(X) \\\\ +\end{aligned} +$$

where the last equations is a consequence of $\log p(X)$ being independent of $q(Z)$. Re-writing the equation and moving everything except for $\log p(X)$ to the right we get

$$ +\log p(X) = \mathrm{E} \left[\log \frac{p(Z, X)}{q(Z)} \right] + KL(q(Z)\ ||\ p(Z | X)). +$$

The first term on the right is usually called the expected lower bound (ELBO, or variational lower bound). Let us denote it as

$$ +\mathcal{L}(q) = \mathrm{E} \left[\log \frac{p(Z, X)}{q(Z)} \right] +$$

giving us the final equation

$$ +\log p(X) = \mathcal{L}(g) + KL(q(Z)\ ||\ p(Z | X)). +$$

Now comes the interesting part. Because we are interested in optimizing by changing $q$, the $\log p(X)$ does not change when $q$ changes. And because the KL divergence between $q(Z)$ and $p(Z | X)$ is always positive, then $\mathcal{L}(g)$ must be a lower bound on $\log p(X)$. As a result, because changing the ELBO by manipulating $q$ does not change $\log p(X)$, the expression on the right must be equal to a constant, which means that increasing $\mathcal{L}(g)$ must decrease $KL(q(Z) || p(Z|X))$. But this is what we wanted all along!

If we find a way to maximize the ELBO, we are effectively minimizing the KL divergence between our approximate distribution $q(Z)$, and our target posterior distribution $p(Z | X)$. If we were to choose $q(Z) = p(Z | X)$, the KL divergence would be zero, and $\mathcal{L}(g) = \log p(X)$. This justifies maximizing the ELBO as an objective in variational inference.

ELBO using Jensen’s inequality

The Jensen’s inequality will give us a bit of motivation behind the ELBO.

In simple terms, Jensen’s inequality states that for a convex function $f(x)$ and a random variable $X$ we get

$$ +E[g(X)] \geq g(E[X]). +$$

Recall that we’re interested in

$$ +\log p(X) = \log \left( \sum_Z p(X, Z) \right). +$$

Introducing a new density $q(Z)$ on the latent variable $Z$ we can re-write the last equation as

$$ +\log \left( \sum_Z p(X, Z) \frac{q(Z)}{q(Z)} \right) = \log \left( \sum_Z q(Z) \frac{p(X, Z)}{q(Z)} \right) = \log \mathrm{E}_q \left[ \frac{p(X, Z)}{q(Z)} \right]. +$$

We can now simply apply the Jensen’s inequality and immediately arrive at the ELBO as a lower bound, since

$$ +\log p(X) = \log \mathrm{E}_q \left[ \frac{p(X, Z)}{q(Z)} \right] \geq \mathrm{E}_q \left[ \log \frac{p(X, Z)}{q(Z)} \right] = \mathcal{L}(q). +$$

Note that we got the same exact equation as above, showing that $\mathcal{L}$ is indeed a lower bound on $\log p(X)$.

References

Bellman Equation

Thu, 22 Nov 2018 17:39:21 +0200

Before we begin, let me just define a few terms:

$S_t$ is the state at time $t$.
$A_t$ is the action performed at time $t$.
$R_t$ is the reward received at time $t$.
$G_t$ is the return, that is the sum of discounted rewards received from time $t$ onwards, defined as $G_t = \sum_{i=0}^\infty \gamma^i R_{t+i+1}$ .
$V^\pi(s)$ is the value of a state when following a policy $\pi$, that is the expected return when starting in state $s$ and following a policy $\pi$, defined as $V^\pi(s) = E[G_t | S_t = s]$ .
$Q^\pi(s, a)$ is the value of a state $s$ when performing and action $a$ and then following the policy $\pi$, that is $Q^\pi(s, a) = E[G_t | S_t = s, A_t = a]$.

Before moving further, note a small algebraic tric for re-writing $G_t$ in terms of itself

+ +

We can use this in the definition of $V^\pi(s)$ and get

+ +

The last equation is called the Bellman equation for $V^\pi(s)$ and it shows a recursive relationship between the value of the current state and the possible next states. This in and of itself is not as interesting, but we’ll use it to derive a solution to finding the optimal policy.

Let us now define the optimal value function, that is the value function of the optimal policy (denoted $\pi^*$).

+ +

that is the optimal value of a state is the maximum over all possible policies. Going one step further, we also define the optimal action-value function as

+ +

It’s easy to see now that $V^*(s) = \max_a Q^*(s, a)$ , that is the maximum value of a state is computed by performing the best possible action. We can use this further to arrive at a simplified Bellman equation as follows

+ +

Here we managed to do a small but important trick in the second last equation (marked with $\stackrel{*}{=}$). But let us first decompose what does $\mathrm{E}_\pi*$ actually mean. Because $G_r$ is the discounted sum of rewards, it is only defined in terms of a policy. But since we assume the policy to be stochastic, we need to take an expectation over all possible actions chosen by the policy, and the possible rewards.

This changes at the marked equation, because we are no longer referring to the policy $\pi_*$, but rather to the value function $V^*$, which is not stochastic. As a reuslt, the expectation in the second to last equation is simply over $R_t$, because we still assume stochastic rewards.

References

Equations for the value function and notation borrowed from Reinforcement Learning: An Introduction by Andrew Barto and Richard S. Sutton.

Linear Regression - least squares with orthogonal projection

Sun, 01 Jul 2018 10:47:11 +0200

Compared to the previous article where we simply used vector derivatives we’ll now try to derive the formula for least squares simply by the properties of linear transformations and the four fundamental subspaces of linear algebra. These are:

Kernel $Ker(A)$: The set of all solutions to $Ax = 0$. Sometimes we can say nullspace $N(A)$ instead of kernel.
Image $Im(A)$: The set of all right sides $b$, for which there is a solution $Ax = b$. We’ll show that this is equal to the column space $C(A)$, which is the span of the column vectors in $A$.
Row space $R(A)$: Span of the row vectors in $A$, sometimes also referred to as $Im(A^T)$ (the image of $A^T$). We can also refer to this as $C(A^T)$, because since $Im(A) = C(A)$, then $Im(A^T) = C(A^T) = R(A)$.
Left kernel $Ker(A^T)$ (or left nullspace): The set of all solutions to $A^T x = 0$. The name comes from left multiplying by $x$, specifically the set of solutions to $x^T A = 0^T$.

For this derivation we assume that $Ker(A) \perp R(A)$ and $Im(A) \perp Ker(A^T)$.

When A is not invertible (could be rectangular), there is no exact solution to $Ax = b$, because $b$ has a component in $Ker(A^T)$, which is outside the range of $A$ (literally). We can define $b = b_i + b_n$ where $b_i$ is the ortogonal projection of $b$ onto $Im(A)$, and $b_n$ is the ortogonal projection of $b$ onto $Ker(A^T)$. In other words, $b_i \perp b_n$.

The above is valid, because we assume $Im(A) \perp Ker(A^T)$, and that $span(Im(A) \cup Ker(A^T)) = rng(A)$, in other words that $Im(A)$ and $Ker(A^T)$ together generate the whole range of our linear mapping $A$. Now just using basic algebra:

$$ +\begin{align} +b &= b_i + b_n \\\\ +b &= Ax + b_n & \text{left multiply by $A^T$} \\\\ +A^T b &= A^T A x + A^T b_n & \text{since $b_n \in Ker(A^T)$, we know $A^T b_n = 0$} \\\\ +A^T b &= A^T A x & \text{$A^T A$ is invertible, see note below} \\\\ +(A^T A)^{-1} A^T b &= x & \text{finally, we get the normal equation} +\end{align} +$$

Here we used the fact that $A^T A$ is always a symmetric positive semi-definite matrix, and in case we have linearly independent columns, it is actually positive-definite, which means it is also invertible. This is actually easy to show.

First we show that $A^T A$ is symmetric. This is easy to see, because $(A^T A)_{ij}$ is just the dot product of $i$-th row of $A^T$ with the $j-th$ column of $A$. Note that $i$-th row of $A^T$ is actually $i$-th column of $A$. From this we see that $(A^T A)_{ij} = (A^T A)_{ji}$, because dot product is symmetric.

Now we show that $A^T A$ is positive semi-definite. For an arbitrary matrix $M$, we say that $M$ is positive semi-definite if and only if $x^T M x \geq 0$ for all $x \in \mathbb{R}$. We can directly substitute $A^T A$ and use the same trick as below:

$$ +x^T A^T A x = (A x)^T A x = ||A x||^2 \geq 0 +$$

Since $A^T A$ satisfies the definition directly, it is positive-semidefinite. $\square$

There is also another very nice way to show that $A^T A$ is invertible, without showing that it is positive semi-definite.

Lemma $Ker(A^T A) = Ker(A)$.

Starting with $Ker(A^T A) \supseteq Ker(A)$, this follows immediately from $Ax = 0 \implies A^T (Ax) = 0$.

Next $Ker(A^T A) \subseteq Ker(A)$: $A^T A x = 0$, left multiply by $x^T$ and we get:

$$ +0 = x^T A^T A x = (A x)^T A x = || Ax ||^2. +$$

Since the $L_2$ norm is zero only if the vector is zero, we get that any vector $x$ for which $A^T A x = 0$, it is also true that $|| Ax ||^2 = 0$, which can only be true when $A x = 0$, and hence $x \in Ker(A)$. $\square$

Because $Ker(A^T A) = Ker(A)$, we also know that $rank(A^T A) = rank(A)$, which means if $A$ has linearly independent columns, $A^T A$ is invertible, because it has a full rank (this is because $A^T A$ is square and has the same number of rows/columns as $A$ has columns).

Matrix Inversion Lemma

Wed, 16 May 2018 23:12:35 +0200

This article is a draft and as such there might be typos and other inaccuracies.

In this article we’ll derive the matrix inversion lemma, also known as the Sherman-Morrisson-Woodbury formula. At first it might seem like a very boring piece of linear algebra, but it has a few nifty uses, as we’ll see in one of the followup articles.

Let’s start with the following block matrix:

$$ +M = \begin{bmatrix} +A & U \\\\ +V & B +\end{bmatrix} +$$

We’ll do an LDU decomposition in two different ways, which basically direclty gives us the end formula. Eliminating the bototm left element we get the following:

$$ +\begin{bmatrix} +I & 0 \\\\ +-V A^{-1} & I +\end{bmatrix} +\begin{bmatrix} +A & U \\\\ +V & B +\end{bmatrix} = \begin{bmatrix} +A & U \\\\ +0 & B - V A^{-1} U +\end{bmatrix}
+$$

The $B - V A^{-1} U$ is called a Schur complement and is generally defined as follows:

$$ +M/A := V A^{-1} U +$$

We’ll use this notation later to make things easier to read. Moving on with the decomposition, we’ll now eliminate $U$.

$$ +\begin{bmatrix} +A & U \\\\ +0 & B - V A^{-1} U +\end{bmatrix} \begin{bmatrix} +I & -A^{-1} U \\\\ +0 & I +\end{bmatrix} = \begin{bmatrix} +A & 0 \\\\ +0 & B - V A^{-1} U +\end{bmatrix} +$$

Putting the two equations above together we get the following:

$$ +\underbrace{\begin{bmatrix} +I & 0 \\\\ +-V A^{-1} & I +\end{bmatrix}}_{X} +\underbrace{\begin{bmatrix} +A & U \\\\ +V & B +\end{bmatrix}}_{M} \underbrace{\begin{bmatrix} +I & -A^{-1} U \\\\ +0 & I +\end{bmatrix}}_{Z} = \underbrace{\begin{bmatrix} +A & 0 \\\\ +0 & B - V A^{-1} U +\end{bmatrix}}_{W} +$$

We could also write the matrix $W$ using the Schur complement notation: +$$ +W = \begin{bmatrix} +A & 0 \\\\ +0 & B - V A^{-1} U +\end{bmatrix} = \begin{bmatrix} +A & 0 \\\\ +0 & M/A +\end{bmatrix} +$$

Now we just express $M$ in terms of $X, Z, W$ and take the inverse to get $M^{-1}$.

$$ +\begin{align} +X M Z &= W \\\\ +M Z &= X^{-1} W \\\\ +M &= X^{-1} W Z^{-1} \\\\ +M^{-1} &= (X^{-1} W Z^{-1})^{-1} \\\\ +M^{-1} &= Z W^{-1} X +\end{align} +$$

Substituting our matrices back in, we get:

$$ +\begin{bmatrix} +A & U \\\\ +V & B +\end{bmatrix}^{-1} += \begin{bmatrix} +I & -A^{-1} U \\\\ +0 & I +\end{bmatrix} +\begin{bmatrix} +A^{-1} & 0 \\\\ +0 & (M/A)^{-1} +\end{bmatrix} +\begin{bmatrix} +I & 0 \\\\ +-V A^{-1} & I +\end{bmatrix} +$$

Now comes the fun part, we’ll multiply out the right side of the equation:

$$ +\begin{align} +\begin{bmatrix} +A & U \\\\ +V & B +\end{bmatrix}^{-1} +&= \begin{bmatrix} +I & -A^{-1} U \\\\ +0 & I +\end{bmatrix} +\begin{bmatrix} +A^{-1} & 0 \\\\ +0 & (M/A)^{-1} +\end{bmatrix} +\begin{bmatrix} +I & 0 \\\\ +-V A^{-1} & I +\end{bmatrix} \\\\ +&= \begin{bmatrix} +A^{-1} & -A^{-1} U (M/A)^{-1} \\\\ +0 & (M/A)^{-1} +\end{bmatrix} +\begin{bmatrix} +I & 0 \\\\ +-V A^{-1} & I +\end{bmatrix} \\\\ +&= \begin{bmatrix} +A^{-1} + A^{-1} U (M/A)^{-1} V A^{-1} & -A^{-1} U (M/A)^{-1} \\\\ +-(M/A)^{-1} VA^{-1} & (M/A)^{-1} +\end{bmatrix} \\\\ +&= \begin{bmatrix} +A^{-1} + A^{-1} U (B - V A^{-1} U)^{-1} V A^{-1} & -A^{-1} U (B - V A^{-1} U)^{-1} \\\\ +-(B - V A^{-1} U)^{-1} VA^{-1} & (B - V A^{-1} U)^{-1} +\end{bmatrix} +\end{align} +$$

That’s it for the first part, now we’ll do the same, but eliminating the top-right element from the left first.

$$ +\begin{bmatrix} +I & -U B^{-1} \\\\ +0 & I +\end{bmatrix} \begin{bmatrix} +A & U \\\\ +V & B +\end{bmatrix} = \begin{bmatrix} +A - UB^{-1}V & 0 \\\\ +V & B +\end{bmatrix} +$$

Here we get the other Schur complement, which we’ll note as $M/B = A - UB^{-1}V$. We can substitute it in straight away this time.

$$ +\begin{bmatrix} +M/B & 0 \\\\ +V & B +\end{bmatrix} \begin{bmatrix} +I & 0 \\\\ +-B^{-1}V & I +\end{bmatrix} = \begin{bmatrix} +M/B & 0 \\\\ +0 & B +\end{bmatrix} +$$

As before, we’ll write it out as a single equation:

$$ +\begin{bmatrix} +I & -U B^{-1} \\\\ +0 & I +\end{bmatrix} +\begin{bmatrix} +A & U \\\\ +V & B +\end{bmatrix} \begin{bmatrix} +I & 0 \\\\ +-B^{-1}V & I +\end{bmatrix} = \begin{bmatrix} +M/B & 0 \\\\ +0 & B +\end{bmatrix} +$$

Now we express the matrix $M$ in terms of the other two (notice the newly added inverse signs):

$$ +\begin{bmatrix} +A & U \\\\ +V & B +\end{bmatrix} = \begin{bmatrix} +I & -U B^{-1} \\\\ +0 & I +\end{bmatrix}^{-1} +\begin{bmatrix} +M/B & 0 \\\\ +0 & B +\end{bmatrix} \begin{bmatrix} +I & 0 \\\\ +-B^{-1}V & I +\end{bmatrix}^{-1} +$$

Lastly, we just take the inverse of both sides:

$$ +\begin{align} +\begin{bmatrix} +A & U \\\\ +V & B +\end{bmatrix}^{-1} &= \left( \begin{bmatrix} +I & -U B^{-1} \\\\ +0 & I +\end{bmatrix}^{-1} +\begin{bmatrix} +M/B & 0 \\\\ +0 & B +\end{bmatrix} \begin{bmatrix} +I & 0 \\\\ +-B^{-1}V & I +\end{bmatrix}^{-1} \right)^{-1} \\\\ +&= \begin{bmatrix} +I & 0 \\\\ +-B^{-1}V & I +\end{bmatrix} +\begin{bmatrix} +M/B & 0 \\\\ +0 & B +\end{bmatrix}^{-1} \begin{bmatrix} +I & -U B^{-1} \\\\ +0 & I +\end{bmatrix} & \text{notice the inverses cancelling out} \\\\ +&= \begin{bmatrix} +I & 0 \\\\ +-B^{-1}V & I +\end{bmatrix} +\begin{bmatrix} +(M/B)^{-1} & 0 \\\\ +0 & B^{-1} +\end{bmatrix} \begin{bmatrix} +I & -U B^{-1} \\\\ +0 & I +\end{bmatrix} \\\\ +&= +\begin{bmatrix} +(M/B)^{-1} & 0 \\\\ +-B^{-1} V (M/B)^{-1} & B^{-1} +\end{bmatrix} \begin{bmatrix} +I & -U B^{-1} \\\\ +0 & I +\end{bmatrix} \\\\ +&= +\begin{bmatrix} +(M/B)^{-1} & -(M/B)^{-1} UB^{-1} \\\\ +-B^{-1} V (M/B)^{-1} & B^{-1} V (M/B)^{-1} UB^{-1} + B^{-1} +\end{bmatrix} \\\\ +&= +\begin{bmatrix} +(A - UB^{-1}V)^{-1} & -(A - UB^{-1}V)^{-1} UB^{-1} \\\\ +-B^{-1} V (A - UB^{-1}V)^{-1} & B^{-1} V (A - UB^{-1}V)^{-1} UB^{-1} + B^{-1} +\end{bmatrix} +\end{align} +$$

Linear Regression - Least Squares Without Orthogonal Projection

Tue, 24 Apr 2018 23:28:14 +0200

There are multiple ways one can arrive at the least squares solution to linear regression. I’ve always seen the one using orthogonality, but there is another way which I’d say is even simpler, especially if you’ve done any calculus. Let’s define the problem first.

Given a matrix $N \times M$ matrix $X$ of inputs, and a vector $y$ of length $N$ containing the outputs, the goal is to find a weight vector $w$ of length $M$ such that:

$$ +X w \approx y +$$

The reason we’re using a $\approx$ instead of $=$ is that we’re not expecting to fit the line exactly through are training examples, as real world data will contain some form of noise.

To find a best possible fit we’ll create a loss function which tells us how well our line fits the data, and then try to minimize the loss. A common choice for regression is the sum of squared errors loss (denoted $L$), which is defined as:

$$ +L = \sum_{i = 1}^{N} \left( X_iw - y_i \right)^2 +$$

We can also write this in vector notation using a squared L2 norm

$$ +L = || Xw - y ||^2 +$$

Now here comes the fun part. Because our loss $L$ is a convex function, it only has a single global minimum, for which we can solve analytically by simply taking a derivative with respect to $w$ and setting that equal to zero. Before we get into that, let’s re-write the loss $L$ to a form which is more suitable for differentiation:

$$ +\begin{align} +L = || Xw - y ||^2 &= (Xw - y)^T (Xw - y) \\\\ +&= (y^T - w^T X^T) (Xw - y) \\\\ +&= y^T X w - y^T y - w^T X^T X w \\\\ +&= y^T X w + w^T X^T y - y^T y - w^T X^T X w \\\\ +&= y^T X w + (y^T X w)^T - y^T y - w^T X^T X w & \text{transpose of a scalar is a scalar} \\\\ +&= y^T X w + y^T X w - y^T y - w^T X^T X w \\\\ +&= 2 y^T X w - y^T y - w^T X^T X w \\\\ +\end{align} +$$

Before moving any further, let us derive a few vector derivative rules (no pun intended). First, the $i$-th row of $Ax$ is defined as follows:

$$ +\left( Ax \right)_i = \sum_{j=1}^{M} A_{ij} x_j = A_{i1} x_1 + A_{i2} x_2 + \dots + A_{iM} x_M
+$$

Now if we take a derivative with respect to $x_j$ we’d get:

$$ +\begin{align} +\frac{\partial}{\partial x_j} \left( Ax \right)_i &= \frac{\partial}{\partial x_j} \left(A_{i1} x_1 + A_{i2} x_2 + \dots + A_{iM} x_M \right) \\\\ +&= \frac{\partial}{\partial x_j} \left(A_{i1} x_1 + A_{i2} x_2 + \dots + A_{ij} x_j + \dots + A_{iM} x_M \right) \\\\ +&= \frac{\partial}{\partial x_j} A_{i1} x_1 + \frac{\partial}{\partial x_j} A_{i2} x_2 + \dots + \frac{\partial}{\partial x_j} A_{ij} x_j + \dots + \frac{\partial}{\partial x_j} A_{iM} x_M \\\\ +&= 0 + 0 + \dots + \frac{\partial}{\partial x_j} A_{ij} x_j + \dots + 0 \\\\ +&= A_{ij} +\end{align} +$$

So this means if we take the $i$-th row of the matrix and derive it by the $j$-th element in $x$, we get back $A_{ij}$. As a result, we get to a nice simple equation:

$$ +\frac{d}{dx} Ax = A +$$

While nice, this doesn’t get us very far. We also need to figure out what happens in the case when the vector is on the left as a row vector, as in $x^T A$

$$ +(x^T A)_i = \sum_{j = 1}^{M} x_{j}^T A_{ji} +$$

Giving us the following partial derivative:

$$ +\frac{\partial}{\partial x_j} (x^T A)_i = A^T +$$

And finally the interesting part:

$$ +\begin{align} +\frac{\partial x^T A x}{\partial x_i} &= \frac{\partial}{\partial x_i} \left( \sum_{j,k} x_j B_{jk} x_k \right) \\\\ +&= \sum_{j} x_j B_{ji} + \sum_{k} B_{ik} x_k \\\\ +&= \sum_{k} B^T_{ik} x_k + \sum_{k} B_{ik} x_k \\\\ +&= \sum_{k} (B^T_{ik} + B_{ik}) x_k \\\\ +&= \sum_{k} (B^T + B)_{ik} x_k \\\\ +\end{align} +$$

Giving us the final:

$$ +\frac{d}{dx} x^T B x = (B^T + B) x +$$

Which means we can take our loss function and take the derivative with respect to $w$:

$$ +\begin{align} +\frac{d}{dw} L &= \frac{d}{dw} (2 y^T X w - y^T y - w^T X^T X w) \\\\ +&= 2 y^T X - 0 - w^T ((X^T X)^T + X^T X) & X^T X \ \text{is symmetrical}\\\\ +&= 2 y^T X - w^T (X^T X + X^T X) \\\\ +&= 2 y^T X - 2 w^T X^T X \\\\ +\end{align} +$$

Now we want this to be equal to $0$ to find the minimum, which gives us the following equation:

$$ +\begin{align} +0 &= 2 y^T X - 2 w^T X^T X \\\\ +2 w^T X^T X &= 2 y^T X \\\\ +w^T &= y^T X (X^T X)^{-1} \\\\ +w &= (X^T X)^{-1} X^T y \\\\ +\end{align} +$$

And there we go, it was a bit of work but we managed to derive the normal equation without the use of orthogonal projection.

Let's Write a 2D Platformer From Scratch Using HTML5 and JavaScript, Part 3: Collision Detection

Thu, 01 Feb 2018 10:00:00 +0100

This article is a part 3 of the Let’s Write a 2D Platformer From Scratch Using HTML5 and JavaScript series.

Part 1: Game Loop
Part 2: Rendering
Part 3: Collision Detection

+ +

The previous post ended with a very ad-hoc solution to collision detection. We barely managed to get vertical movement working, and didn’t even get started on gravity. There’s a reason for that. Once the player starts moving vertically, we need something better than just checking against the top left corner we immediately run into issues such as passing through walls below the player, as shown in this image:

What we really want is some concept of a collider, which represents the rigid body of an object. There is however one important distinction. We are only concerning ourselves with collision detection, not full rigid body physics, such as is implemented in Box2D. While physics engines provide a huge range of possibilities and features, they are also sometimes difficult to control, such as when using elevators, moving platforms, or climbing slopes. What we’ll do instead is implement simple horizontal and vertical raycasts which will together with a bit of code provide great control over how the player moves in the world. Ray casting is basically like shooting a laser and seeing where it lands. It will allow us to see which object is the closest in a specific direction, and how far is it.

Some platformers can benefity greatly from the use of a physics engines, such as the popular Little Big Planet. Other platformers, such as Super Mario Bros., which require highly precise controls might be better off being implemented using raycasts, as they give the programmer more control. Physics engines are more inclined to work with things like forces, friction, velocities, etc., while the raycast approach is more in the terms of something like if the player is hugging a wall he can press the jump button where the hugging a wall is determined by doing a raycast on both sides of the player and seeing how far is the closest wall.

To keep things simple we still constrain the game to box-shaped objects instead of arbitrary polygons. Each tile is a square, as is the player. But to make things at least somewhat interesting we’ll allow arbitrary shaped boxes, as long as they’re not rotated. This is commonly called an Axis-Aligned Bounding Box (AABB for short).

As a small sidenote, there are already tons of existing libraries for vector math, collision detection, or even specifically ray vs AABB collision detection. I initially didn’t intend to do this part completely from scratch and at least use a small vector wrapper like victor.js, but the library looks basically dead and there are some outstanding issues left unfixed. Then there’s gl-matrix which seems to be up to date, but seems to be more concerned about performance and WebGL than anything else, which isn’t the priority in this series.

Similarly, there is ray-aabb-intersection and ray-aabb which are both more general than what we’ll implement here, but at the same time I feel that one has to ask themselves the question, why are we doing this? If the goal is to write production quality code, we probably wouldn’t pick small libraries which haven’t had any activity for 2-3 years. If the goal is learning, then we get the most out of it by implementing things ourselves.

I wouldn’t be against using a production quality collision library and focus on learning other things, but there doesn’t seem to be one available for JavaScript, comparable to something like ncollide for Rust, and I don’t feel like bringing in the whole of Box2D just yet when we’re starting out. It will probably come handy in a future blog post when we’ll need a more robust solution to collisions and physics.

Horizontal raycast against AABBs

Before we move on, let us define a few things:

AABB is a rectangle, which is defined by its top-left corner, width and height.
Ray is a half-line, which is defined by its origin and direction (up/down/left/right in our case).
A horizontal ray can be in three possible configurations with an AABB (consider ray direction to be right, the left case is symmetrical): +
- Intersection: the ray starts left of the AABB (ray.origin.x < aabb.x) and hits the AABB (ray.origin.y >= aabb.y && ray.origin.y <= (aabb.y + aabb.h)).
- Inside: the ray starts in the inner interval of the AABB on the x-axis (ray.origin.x >= aabb.x && ray.origin.x <= (aabb.x + aabb.w)) and the same for the y-axis (ray.origin.y >= aabb.y && ray.origin.y <= (aabb.y + aabb.h)).
- No intersection: if neither of the above is true.
+

+
When we define these properties, we’re thinking in the HTML5 Canvas’ system of coordinates where positive X means right and positive Y means down. This is different than what the usual Cartesian system of coordinates, which considers its bottom left corner as origin, and positive Y going up.
+

Just looking at the definitions, they basically give us the answer already. We could just copy paste the definitions, adjust the comparisons for each of the 12 cases (3 for each direction, and there are 4 directions) and be done. But let’s try to be smart and generalize the conditionals at least a little bit.

var Direction = {
+    UP: "up",
+    DOWN: "down",
+    LEFT: "left",
+    RIGHT: "right"
+};
+
+// A ray is defined by its origin and direction.
+function Ray(x, y, dir) {
+    this.origin = { x: x, y: y };
+    this.direction = dir;
+}
+
+// AABB is defined by its top-left corner, width and height.
+function AABB(x, y, w, h) {
+    this.x = x;
+    this.y = y;
+    this.w = w;
+    this.h = h;
+}
+
+function compareInterval(value, low, high) {
+    // This shouldn't necessarily be required, but it allows us to just specify
+    // the bounds of an interval, without checking which of the two is low and which is high.
+    if (low > high) {
+        var tmp = high;
+        high = low;
+        low = tmp;
+    }
+
+    // And then we simply check if the value lies outside of the interval
+    if (value < low) {
+        return -1;
+    } else if (low <= value && value <= high) {
+        return 0;
+    } else if (value > high) {
+        return 1;
+    }
+}
+
+function intersect(ray, aabb) {
+    switch (ray.direction) {
+        case Direction.UP:
+        case Direction.DOWN:
+        case Direction.LEFT:
+        case Direction.RIGHT:
+            if (ray.origin.x <= aabb.x) {
+                if (compareInterval(ray.origin.y, aabb.y, aabb.y + aabb.h) == 0) {
+                    return { x: aabb.x, y: ray.origin.y };
+                } else {
+                    return null;
+                }
+            } else {
+                return null;
+            }
+    }
+}
+

Before we move on any further, we should probably test this code to see if it works, since subtle bugs in collision detection could be hard to debug later on when we try to use it in a game. While the spirit of these articles is let’s write everything ourselves, I don’t feel that writing our own testing framework would be of any benefit, and if you’ve read this far, you can probably do it without many issues (at least for the synchronous testing, which is what we’ll be doing here).

We’ll use QUnit, because it was the only one that literally had a copy-paste-this-and-it-works, which makes it ideal for the JSFiddle based format of these articles. If you have suggestions for a better testing library, please do share them in the comments, but keep in mind that this series is not about setting up the most elaborate build system. If it needs a command line tool to run/build, it’s already too much of a hassle.

Here’s a few tests for the Direction.RIGHT case:

+ +

Implementing the other three directions isn’t really challenging as everything is symmetrical. We just have to be careful not to make any mistakes when passing down coordinates. Here’s a full intersect function:

function intersect(ray, aabb) {
+    switch (ray.direction) {
+        case Direction.UP:
+            if (ray.origin.y >= aabb.y + aabb.h) {
+                if (compareInterval(ray.origin.x, aabb.x, aabb.x + aabb.w) == 0) {
+                    return { x: ray.origin.x, y: aabb.y + aabb.h };
+                } else {
+                    return null;
+                }
+            } else {
+                return null;
+            }
+        case Direction.DOWN:
+            if (ray.origin.y <= aabb.y) {
+                if (compareInterval(ray.origin.x, aabb.x, aabb.x + aabb.w) == 0) {
+                    return { x: ray.origin.x, y: aabb.y };
+                } else {
+                    return null;
+                }
+            } else {
+                return null;
+            }
+        case Direction.LEFT:
+            if (ray.origin.x >= aabb.x + aabb.w) {
+                if (compareInterval(ray.origin.y, aabb.y, aabb.y + aabb.h) == 0) {
+                    return { x: aabb.x + aabb.w, y: ray.origin.y };
+                } else {
+                    return null;
+                }
+            } else {
+                return null;
+            }
+        case Direction.RIGHT:
+            if (ray.origin.x <= aabb.x) {
+                if (compareInterval(ray.origin.y, aabb.y, aabb.y + aabb.h) == 0) {
+                    return { x: aabb.x, y: ray.origin.y };
+                } else {
+                    return null;
+                }
+            } else {
+                return null;
+            }
+    }
+}
+

And here’s a JSFiddle with test cases for all four of the directions:

+ +

We can also create a simple wrapper to calculate the distance of an AABB:

function distance(a, b) {
+    return Math.sqrt(Math.pow(a.x - b.x, 2) + Math.pow(a.y - b.y, 2));
+}
+
+function raycastDistance(ray, aabb) {
+    var point = intersect(ray, aabb);
+
+    if (point) {
+        return distance(ray.origin, point);
+    } else {
+        return null;
+    }
+}
+

Raycasts against multiple box colliders

So far we implemented an intersect function which takes a ray and an AABB and calculates the intersection point if there is one. While this is useful, it would be far more useful to have a function which simply takes a ray and returns the closest box collider (AABB) the ray hits if there is one. There are numerous ways we could do this. In the ideal case, we’d use a spatial data structure (such as a quad tree or a spatial hash) to figure out the areas through which the ray is cast, and then check against colliders within that area. This avoid iterating the whole world on each raycast. Adding a more intelligent selection of colliders can be done separate from the rest of the code we’ll write here which is why we’ll simply iterate all of the colliders and return the intersection with the closest one.

The code would look something like this:

function raycast(ray, world) {
+    var collisions = world.colliders.map(function(aabb) {
+        var intersection = intersect(ray, aabb);
+        var result = {
+            collider: aabb,
+            intersection: intersect(ray, aabb)
+        };
+
+        if (result.intersection) {
+            result.distance = distance(ray.origin, result.intersection);
+        }
+
+        return result;
+    }).filter(function(collision) {
+        return collision.intersection;
+    });
+
+    collisions.sort(function(c1, c2) {
+        return c1.distance - c2.distance;
+    });
+
+    // JavaScript is happy with out of bound indexing, which means this line is still
+    // valid even if there are no collisions.
+    return collisions[0];
+}
+

Now all we need is to fill a world with colliders and we can do arbitrary raycasts in horizontal/vertical directions and see what we hit. Let’s now go back to our game and test this out. I’ve added a few debug drawing helpers to visualize the rays and the distance from the collider.

+ +

Click on the canvas and press A and D to move. You can also click on the JavaScript tab to see the full code for this example.

An important thing to note is that we get 0 distance when touching the wall. This is due to the <= vs < comparisons, which make it easier for us to detect touching walls. It might be worth nothing here that if our colliders weren’t aligned to integer coordinates, we might run into numerical errors.

Collisions on all sides

In order for the player to move around both up/down and left/right we need to detect collisions on all of the edges, so that the player stays blocked by a wall even if only a part of his collider is blocked. We do this by using two raycasts on each side, and storing the results on the player object.

A lot of the old collision code can be replaced with simple conditionals checking the computed distance against the velocity. Note the added lodash library. While it doesn’t provide anything we couldn’t implement ourselves I wanted to keep the code samples on point, and implementing stuff like _.flatten and _.min wouldn’t serve any purpose in this article.

We also inset each ray a little bit into the player box by a small offset off so that the ray doesn’t collide with boxes next to the player in a different direction. For example, a LEFT ray could collide with the ground beneath the player, which is not what we want, as the ground collision will be detected separately by the bottom rays. This is why the BOTTOM LEFT ray is lifted up a little bit to avoid touching the ground.

+ +

Click on the canvas and press A and D to move. You can also click on the JavaScript tab to see the full code for this example.

Gravity

Introducing gravity isn’t terribly difficult. We implement it just like in real life, a force causing downward acceleration. This is easier than it sounds, acceleration simply means velocity change per second, which we can implement by adding vertical velocity to the player, and then updating the velocity with a gravitational constant on each frame.

+ +

Click on the canvas and press A and D to move and W to jump. You can also click on the JavaScript tab to see the full code for this example.

Overall, after we have the collision rays implemented, the rest of the logic is just a matter of writing out a few conditionals that handle each case.

There is one bug though, which we won’t fix. When the player moves in a perfectly diagonal direction as they’re falling and lands on a corner of a wall the collision won’t detect the wall and the player will fall through. This is because both of the rays on the corner are inset a small amount, causing the player to fall a little bit inside the wall. After that, the collision detection part will notice the rays start inside a wall and return null (because the player is technically inside the box at that point).

This goes to show that even simple as casting a few rays against boxes can have lots of edge cases that need to be handled. It is also a good argument for not implementing physics from scratch in a game that will go in productiton, as even a simple case like this isn’t trivial to get completely right.

Conclusion

We’ve implemented basic collision handling and gravity. The game finally plays like a very simple platformer. We won’t be diving into physics anymore, at least not in the sense of implementing them ourselves. Instead, we’ll look at tweening next so that we can add a few simple animations to the game.

+ +

This article is a part 3 of the Let’s Write a 2D Platformer From Scratch Using HTML5 and JavaScript series.

Part 1: Game Loop
Part 2: Rendering
Part 3: Collision Detection

+ +

References

Let's Write a 2D Platformer From Scratch Using HTML5 and JavaScript, Part 1: Game Loop

Sun, 28 Jan 2018 00:00:00 +0000

This article is a part 3 of the Let’s Write a 2D Platformer From Scratch Using HTML5 and JavaScript series.

+ +

As far as gamedev and HTML5 goes, there are tons of great game engines already out there. How do we pick the right one? Look at the number of stars on GitHub? The number of contributors? The number of published games? If you’ve looked at the previous articles on this blog, you probably know where this is heading (or read the title for that matter). We’re going to write our own game engine first!

Writing a game engine is not an easy task however. We’ll start out with just a simple 2d platformer. There won’t be any asset pipeline, and all the rendering will be done with rectangles using a simple HTML5 Canvas API. But this does not prevent us from doing animations. We’ll also write a simple tweening library to make animations and other time-based effects easy to add. But let’s first begin with the game loop.

If you’re curious to see where this series is going, here’s a little sneak peek of what we’ll have at the end of part 3, in which we implement collisions and gravity. Don’t worry if it seems like a lot of code (click on the JavaScript tag to see the source), we’ll build it up step by step in a way that everything should be clear along the way.

+ +

Click on the canvas and press A and D to move and W to jump. You can also click on the JavaScript tab to see the full code for this example.

Deploying a game like this is easy. If you click on Edit in JSFiddle on the top right, you’ll see there is a HTML and JavaScript part. The HTML only has two lines, one of which defines the canvas, and one defines a div used for debugging. That’s all there is needed for the game to work. After that, you can just add a script tag with all of the code (under the JavaScript section) and you’re almost ready to go. While this blog series doesn’t rely heavily on libraries, later on we’ll add Lodash to keep the code cleaner without writing much boilerplate. Adding Lodash is either as well, all it takes is a single line to serve it from a CDN. Overall, the page to deploy the game could look something like this:

<!DOCTYPE html>
+<html>
+<head>
+  <script src="https://unpkg.com/lodash@4.17.4/lodash.js"></script>
+</head>
+<body>
+    <canvas id="canvas" height="200" width="200" tabindex="1"></canvas>
+    <div id="debug-text"></div>
+    <script src="./game.js"></script>
+</body>
+</html>
+

But that’s it for the spoilers! We first need to build our game, so let’s get started.

Game loop

The core of the game loop is calculating dt (or deltaTime), which is the time elapsed since the last frame. Every run of the game loop then updates all of the necessary logic. If we keep dt in seconds, we can measure all velocities as per second and calculate the per-frame update by simple multiplication. For example, if we intend to change player.x by 40 pixels per second, we can calculate the offset of a single frame by just doing player.x += 40 * dt. The units simply add up: px/s * s = px.

Modern browsers have a great way of implementing the game loop with the requestAnimationFrame function. This function takes a callback, which is then called right before the next repaint of the browser window. It only calls the callback once, which means if we want our game loop to run continually we need to call requestAnimationFrame at the end of it. One neat feature is that the requestAnimationFrame function calls our callback with a timestamp argument, which basically indicates the number of milliseconds since the page has loaded.

function gameLoop(timestamp) {
+    // Normally we would update the game logic here.
+    console.log(timestamp);
+
+    // Enqueue another run of the `gameLoop` function on the next browser repaint.
+    window.requestAnimationFrame(gameLoop);
+}
+
+// We also initially call the `gameLoop` function via `requestAnimationFrame`.
+window.requestAnimationFrame(gameLoop);
+

We can use the timestamp to calculate our dt value. The number of callbacks of requestAnimationFrame is usually 60 times per second (60 FPS), which amounts to about 16 milliseconds per frame. We could use new Date().getTime() to access the current time, but there is a newer and better API specifically intended for performance measurements. The function is performance.now() and also returns the number of milliseconds since the page has loaded. This is actually the same value that window.requestAnimationFrame passes into the callback, so we can use it to calculate the initial time before the game loop begins.

// We initialize the time of the last frame to the current time.
+var lastFrame = performance.now();
+
+// window.requestAnimationFrame calls our game loop with a timestamp
+// of when the callback started being processed (in milliseconds).
+function gameLoop(timestamp) {
+    // We calculate the time since last frame in seconds
+    // and update the timestamp of the last frame.
+    var dt = (timestamp - lastFrame) / 1000;
+    lastFrame = timestamp;
+
+    console.log(dt);
+
+    window.requestAnimationFrame(gameLoop);
+}
+
+window.requestAnimationFrame(gameLoop);
+

If we didn’t initialize the lastFrame variable to performance.now(), we could run into an issue of our game jumping forward in time on the first frame. This could happen especially if the game doesn’t start immediately with the page loading. To test this, try opening up the Developer Console on any web page (this one for example) and enter window.requestAnimationFrame(console.log) after a few seconds, and you’ll see a fairly large number.

There is one problem with this approach to lastFrame initialization. It does not work in Chrome! It works just fine in Firefox, but Chrome has an open issue since 2013 which causes it to call the gameLoop with a timestamp lower than the initial value returned by performance.now(). In other words, the first frame would get a negative dt. Fortunately, the workaround isn’t terribly difficult. We can implement simple frame limiting that caps our game loop at 60 FPS, which will also fix this issue by skipping a loop if the time that has passed is less than 1000 / 60 milliseconds.

var lastFrame = performance.now();
+
+function gameLoop(timestamp) {
+    // Moving `requestAnimationFrame` won't change how the loop behaves, since JavaScript
+    // runs synchronously from top to bottom and we can't get interrupted in the middle
+    // of the game loop by another call caused by an earlier `requestAnimationFrame`.
+    window.requestAnimationFrame(gameLoop);
+
+    // Here we simply skip the whole iteration if enough time hasn't passed yet.
+    if (timestamp < lastFrame + (1000 / 60)) {
+        return;
+    }
+
+    var dt = (timestamp - lastFrame) / 1000;
+    lastFrame = timestamp;
+
+    console.log(dt);
+}
+
+window.requestAnimationFrame(gameLoop);
+

One last thing we might want to do before moving on is the ability to stop the game loop. Luckily, requestAnimationFrame returns an ID which can later be passed to window.cancelAnimationFrame() to cancel the scheduled frame request. All we have to do is store this value in each iteration of the gameLoop.

var lastFrame = performance.now();
+var requestAnimationFrameId;
+
+function stopGameLoop() {
+    window.cancelAnimationFrame(requestAnimationFrameId);
+}
+
+function gameLoop(timestamp) {
+    requestAnimationFrameId = window.requestAnimationFrame(gameLoop);
+
+    if (timestamp < lastFrame + (1000 / 60)) {
+        return;
+    }
+
+    var dt = (timestamp - lastFrame) / 1000;
+    lastFrame = timestamp;
+
+    console.log(dt);
+}
+
+requestAnimationFrameId = window.requestAnimationFrame(gameLoop);
+

Calculating FPS with exponential moving average

Lastly, before moving on to implement tweening I’d like to show one more useful thing. Game often have the ability to display FPS as the game is running. The easiest way is to use an exponential moving average (another resource) which requires no additional memory, compared to the often mentioned method of using an array of older values and doing a running average on those. If we used an array to store say 10 previous values, calculate the average off that on each frame, and push a new value on the next frame while popping the oldest value, we’d get what is called a moving average. The key factor there is that all values have the same weight. While in this implementation of the exponential smoothing we put more weight on newer values and decay the older ones faster and faster (exponentially). That means if we have an exponential moving average calculated from 10 values, the newer values will contribute to the result much more than the older ones.

Now let’s see how it works. First we have to pick an $\alpha$ value which determines how quickly we decay older values. A common choice is $\alpha = 0.1$. Then calculating the value with respect to the current frame $FPS_{current}$ we use the value from the last calculation $FPS_{last}$ and FPS based on the current value dt, which is calculated as $\frac{1}{dt}$. Putting all this together we get:

$$FPS_{current} = \alpha \cdot \frac{1}{dt} + (1 - \alpha) \cdot FPS_{last}$$

Or alternatively (after a few basic algebraic operations):

$$FPS_{current} = FPS_{last} + (1 - \alpha) \cdot (\frac{1}{dt} - FPS_{last})$$

While this might look at a lot of complicated math, it really isn’t. We’re just scaling down the old value based on $\alpha$ as we’re adding new values. After a few iterations, the initial values was scaled down by $\alpha$ multiple times.

Implementing this in code is easy, we just pick one of the formulas and write it as is, updating a FPS variable after each dt is calculated. One last note

var lastFrame = performance.now();
+var requestAnimationFrameId;
+var FPS = 1; // It doesn't really matter what value we initialize FPS to.
+var alpha = 0.1;
+
+function stopGameLoop() {
+    window.cancelAnimationFrame(requestAnimationFrameId);
+}
+
+function gameLoop(timestamp) {
+    requestAnimationFrameId = window.requestAnimationFrame(gameLoop);
+
+    if (timestamp < lastFrame + (1000 / 60)) {
+        return;
+    }
+
+    var dt = (timestamp - lastFrame) / 1000;
+    lastFrame = timestamp;
+    FPS = FPS + (1 - alpha) * (1/dt - FPS);
+
+    console.log(FPS);
+}
+
+requestAnimationFrameId = window.requestAnimationFrame(gameLoop);
+

Conclusion

This concludes the first part in this series. We’ve written a simple game loop with an FPS counter. The next article with continue with basic rendering and input handling.

+ +

This article is a part 3 of the Let’s Write a 2D Platformer From Scratch Using HTML5 and JavaScript series.

+ +

References

Let's Write a 2D Platformer From Scratch Using HTML5 and JavaScript, Part 2: Rendering

Sun, 28 Jan 2018 00:00:00 +0000

This article is a part 3 of the Let’s Write a 2D Platformer From Scratch Using HTML5 and JavaScript series.

+ +

Now that we have the game loop, we can build a small wrapper around the HTML5 Canvas API. We’ll start with a simple tile based map where each tile is rendered as a colored rectangle.

The map will be specified as an array of numbers where 0 means empty and 1 means wall. JavaScript doesn’t have direct support for multi-dimensional arrays, which leaves us with two options. We can either store all of the values in a 1D array and calculate the index based on 2D coordinates, or we can use an Array of Arrays. The second approach is simpler in terms of indexing, but works conceptually very differently. For example, there is nothing enforcing each row to have the same length as the other rows.

Using 1D arrays to store a 2D matrix is actually a very common pattern in lower level programming, which is why we’ll pick it here mainly for the educational purpose. The core idea is that if we have want to access an element at i-th row and j-th column, we’ll have to skip i * ROW_LENGTH elements to get to a subset of the array where the i-th row begins. After that, we just add the offset j within the i-th row to access the element. Since we’re specifying map dimensions as MAP_W and MAP_H (for map width and height) we simply do i * MAP_W + j.

+ +

 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+

var canvas = document.getElementById("canvas");
+var ctx = canvas.getContext("2d");
+
+var BOX_SIZE = 20;
+var MAP_W = 10;
+var MAP_H = 10;
+
+function drawBox(color, x, y, w, h) {
+    ctx.fillStyle = color;
+    ctx.fillRect(x, y, w || BOX_SIZE, h || BOX_SIZE);
+}
+
+var map = [
+  0,0,0,0,0,0,0,0,0,0,
+  0,0,0,0,0,0,0,0,0,0,
+  0,0,0,0,0,0,0,0,0,0,
+  0,0,0,0,0,0,0,0,0,0,
+  0,0,0,1,1,1,0,0,0,0,
+  0,0,0,0,0,0,0,0,1,1,  
+  1,0,0,0,0,0,0,0,0,0,
+  1,0,0,0,0,0,0,0,0,0,
+  1,1,1,1,1,0,1,1,1,1,
+  1,1,1,1,1,0,1,1,1,1,  
+];
+
+function drawMap() {
+    for (var i = 0; i < MAP_H; i++) {
+        for (var j = 0; j < MAP_W; j++) {
+            // Calculating the color for a tile on corrdinates [j, i].
+            var color = map[i * MAP_W + j] ? "#5d995d" : "lightblue";
+            // And draw it at the appropriate offset.
+            drawBox(color, BOX_SIZE * j, BOX_SIZE * i);
+        }
+    }
+}
+
+// Draw the player.
+drawBox("#612b2e", 0, 5 * BOX_SIZE);

+ +

Later on when we put things together, each iteration of the gameLoop will call drawMap to render the background.

function gameLoop(timestamp) {
+    // ... rest of the game loop
+
+    drawMap();
+    // Draw the player.
+    drawBox("#612b2e", 0, 5 * BOX_SIZE);
+}
+

Now all that is left to do is implement player movement.

Input handling and simple movement

There is also no way to check if a key is being pressed in JavaScript, so we’ll create a small global handler that stores the keypress values in a global map. Later on we can add the ability to detect key press just in the frame in which it occurred.

var keys = {};
+window.onkeyup = function(e) { keys[e.keyCode] = false; }
+window.onkeydown = function(e) { keys[e.keyCode] = true; }
+

With these, we can write an updatePlayer function which takes a dt and moves the player based on a key being pressed. We’ll also need a drawPlayer function to draw the player at their position.

var player = { x: 0, y: 0 };
+
+function updatePlayer(dt) {
+    // Key codes for player hotkeys.
+    var A = 65;
+    var W = 87;
+    var D = 68;
+    
+    // The player moves at 80px per second.
+    var SPEED = 80;
+    
+    if (keys[A]) { player.x -= SPEED * dt; }
+    if (keys[D]) { player.x += SPEED * dt; }
+}
+
+function drawPlayer() {
+    drawBox("#612b2e", player.x, player.y);
+}
+

Basic collision handling

Collision handling is a complicated subject, especially if there can be arbitrary geometry present in the physics world. Luckily for us, we only have boxes of constant dimensions, and all of the walls are aligned to the tile map. The player is also the only object moving in the world, which means we only calculate collisions against the environment. If we had a multi-agent environment, we’d need a more general concept of colliders and raycasting. But for now, we can implement the raycast by simply checking the adjacent tiles on each side.

Since the player can move just a single pixel, we need to actually calculate its position within a tile. We can do this by using the modulo operation % which returns the remainder after integer division. player.x % BOX_SIZE returns a value from 0 to BOX_SIZE, which is exactly the x offset of the top left corner within its containing tile. We’ll store the tile coordinates in variables tileX and tileY.

We’ll also check if the player stands next to a wall on both sides. The right side is a tiny bit trickier, because we’re measuring the position from the top-left corner, which means we actually have to look two tiles to the right. We’ll improve this later on when we write a more general collision handling logic.

Lastly, we introduce a new concept, the player’s velocity. This can be thought of as the number of pixels the player will move within the frame (per-frame velocity). The velocity is initially based on the player’s inputs. We then check if the player is moving in a direction of a wall, and check if the velocity is greater than the distance to the wall. If it is, the player would skip into the wall on the frame update, which is why we use Math.min/Math.max to make sure the player moves at most the distance he needs to reach the wall.

function updatePlayer(dt) {
+    // Key codes for player hotkeys.
+    var A = 65;
+    var W = 87;
+    var D = 68;
+    
+    // The player moves at 80px per second.
+    var SPEED = 80;
+    
+    // We calculate the tile where the player is.
+    var tileX = Math.floor(player.x / BOX_SIZE);
+    var tileY = Math.floor(player.y / BOX_SIZE);
+
+    // Player collides on the left either with the leftmost edge of the screen,
+    // or with a tile which is adjacent to the left.
+    var possibleCollisionLeft = tileX == 0 || map[tileY * MAP_W + (tileX - 1)];
+    // Same for the right side.
+    var possibleCollisionRight = tileX == (MAP_W - 1) || map[tileY * MAP_W + (tileX + 2)];
+    
+    // Vertical velocity of the player.
+    var vx = 0;
+
+    if (keys[A]) { vx = -SPEED * dt; }
+    if (keys[D]) { vx = SPEED * dt; }
+
+    if (vx < 0 && possibleCollisionLeft) {
+        // If the player is near a left wall, either move him closer to the wall
+        // based on his velocity, or based on his offset within the tile if the velocity
+        // would cause him to run through the wall.
+        vx = Math.max(vx, -(player.x % BOX_SIZE));
+    }
+            
+    if (vx > 0 && possibleCollisionRight) {
+        // Same as for moving left, but here we have to account for the fact
+        // that we use the left corner as the player's position, hence the distance
+        // to the wall is computed differently.
+        vx = Math.min(vx, BOX_SIZE - (player.x % BOX_SIZE));
+    }
+    
+    // Lastly, we have to check if the player is already standing next to a wall,
+    // and nullify the vertical velocity in that case.
+    if (vx > 0 && map[tileY * MAP_W + (tileX + 1)]) {
+        vx = 0;
+    }
+    
+    player.x += vx;
+}
+

And here’s how it looks inside the game: (move player with A and D keys)

+ +

Conclusion

We’ve implemented basic input and collision handling with player movement. What we have so far serves more as a demonstration than what we’ll end up with in the next article, as the collision handling is not flexible enough to handle more complicated player movement ansuch as gravity.

+ +

This article is a part 3 of the Let’s Write a 2D Platformer From Scratch Using HTML5 and JavaScript series.

+ +

References

Binary Search in JavaScript

Thu, 25 Jan 2018 10:00:00 +0100

Binary search is an extremely useful and popular algorithm used for quickly searching in a sorted array. Binary search is even used by people in real life (outside of programming) when they play the guess the number game. One person thinks a number between 1 and 100, and the other person tries to guess. The response is only less, equal or greater. If you guess 50 and get a less response, you just narrowed down the search to half the interval, 1 to 50. You can keep going and guess 25. No matter if what answer you get, you either win, or narrow down the interval again to a half. This is how binary search works. We test our searched value against the element in the middle, if it’s less than the middle element, we repeat the process on the left part, if it’s greater than the middle element, we repeat the process on the right part. We repeat this until we get a value which is equal, at which point the search is complete.

The initial requirement is that the array must be sorted. Why? Because a sorted array has a simple property. If we take any two indexes i and j, if i <= j, then array[i] <= array[j]. This means if we know that the searched number is lower than the number at the middle index, its index must be also lower.

To implement to algorithm we will use two variables to store the bounds of our search area, low and high. We begin by setting low = 0 and high = array.length. We calculate the middle as the average of low and high, always rounding down using the Math.floor function. Note that we could also use a bitwise shift to the right with the >> operator to achieve the same, but we’ll keep the division explicit to make things easier to read. We then compare the number at the middle index to our search value. Here we can run into three different cases:

If array[mid] < value, then we need to move to the right, updating our lower bound low = mid + 1. We add the one because the value can’t possibly be at mid (since it’s greater than array[mid]), so we can skip that index entirely.
If array[mid] > value, then we need to move to the left, updating our upper bound high = mid. Since we initialize our high to array.length, the search range does not include the high index (it’s a left-closed interval), which means we don’t need to subtract 1 from the upper bound.
If array[mid] == value, we simply return the mid index as a result.

We keep iterating until low == high, at which point the search is narrowed down to a single element which must be the result if the element was initially present.

function binarySearch(array, value) {
+    var low = 0;
+    var high = array.length;
+
+    while (low < high) {
+        var mid = Math.floor((low + high) / 2);
+
+        if (array[mid] < value) {
+            low = mid + 1;
+        } else if (array[mid] > value) {
+            high = mid;
+        } else {
+            return mid;
+        }
+    }
+
+    return high;
+}
+

Lastly, it’s important to note that if we search for an element which is not present in our array, we still get an index as a return value. There are two ways to think about this. Either we think of the binarySearch function as searching for an index of an existing element inside the array, at which point we might want to return -1 in case the element isn’t found. Or we use it to calculate an index at which we should insert a new element into the array so that it remains sorted.

If we wanted the first variant, we could modify the binary search to check the result value before returning it.

function binarySearch(array, value) {
+    var low = 0;
+    var high = array.length;
+
+    while (low < high) {
+        var mid = Math.floor((low + high) / 2);
+
+        if (array[mid] < value) {
+            low = mid + 1;
+        } else if (array[mid] > value) {
+            high = mid;
+        } else {
+            return mid;
+        }
+    }
+
+    if (array[high] == value) {
+        return high;
+    } else {
+        return -1;
+    }
+}
+

Alternatively, we can use the first implementation to implement an insert into a sorted array which uses binary search to find the right place to insert the value. This is rather trivial, we simply find the proper index with the first version of our binarySearch, and then use Array.splice to insert the new value.

The original code is provided here again so that the example as a whole can be copy-pasted into a developer console (or any other JavaScript environment) for experimentation.

function binarySearch(array, value) {
+    var low = 0;
+    var high = array.length;
+
+    while (low < high) {
+        var mid = Math.floor((low + high) / 2);
+
+        if (array[mid] < value) {
+            low = mid + 1;
+        } else if (array[mid] > value) {
+            high = mid;
+        } else {
+            return mid;
+        }
+    }
+
+    return high;
+}
+
+function binaryInsert(array, value) {
+    var index = binarySearch(array, value);
+    array.splice(index, 0, value);
+}
+
+var arr = [1, 2, 3];
+
+binaryInsert(arr, 2);
+console.log(arr);
+// [1, 2, 2, 3]
+
+binaryInsert(arr, 5);
+console.log(arr);
+// [1, 2, 2, 3, 5]
+
+binaryInsert(arr, 0);
+console.log(arr);
+// [0, 1, 2, 2, 3, 5]
+

Time complexity

The main benefit of binary search is its time complexity, which is only $O(\log n$, compared to regular linear search (going through the whole array searching for the right index), which is $O(n)$. Searching 1000000 elements using binary search would only take roughly 20 steps while it could take up to 1000000 steps.

But why is it $O(\log n)$? Each iteration of binary search reduces the search space by half, which means we can translate the question how many steps does it take? to how many times can we take a half of a number until we get to 1? If the array length was a power of two, meaning we could write it as $2^k$, we could divide it by 2 exactly k times. If we have an arbitrary number n and we want to write it as $2^k$, we can calcualte k with a logarithm, specifically $\log_2 n = k$. But by the definition of the big-O notation, we don’t need to worry about constants, and since $\log_2 n = \frac{\log n}{\log 2}$, we can simply use $O(\log n)$.

Binary Heap (Priority Queue) in JavaScript

Wed, 24 Jan 2018 10:00:00 +0100

A binary heap is a simple data structure most often used for implementing priority queues. In a more general sense, a heap is a tree-based data structure which satisfies the heap property, and a binary heap is a heap which uses a binary tree to store its data. Any arbitrary binary tree which satisfies the following two properties can be considered a binary heap:

heap property: If P is a parent node of N, then P.key <= N.key, meaning the parent always has a lower key than its children. This gives us a MIN heap, where the root node has the minimum key of the whole heap. If we want a MAX heap, we simply flip the property to P.key >= N.key. Everything that is true for a MIN heap is true for a MAX heap, so from now on, we’ll consider a MIN heap only.
All levels of the tree are completely filled, except for the last one, which is filled from the left.

Here’s an example of a valid heap. Note that the last level is missing one element on the right. It’s also important to distinguish that a binary heap is not a search tree. While a binary search tree satisfies the property that the left child has a lower value than the parent and right child has a higher value than the parent, a binary heap has no such property. This also means that a binary heap can not be searched.

+ +

Here is another example, but this time of a tree which doesn’t follow the heap property, and as such is not a binary heap.

+ +

And here’s an example of a tree which doesn’t satisfy the same property, as the last layer is not filled from the left.

+ +

The first property (the heap property) gives us the ability to access the MIN element in constant time, because it must be in the root of the tree. If the MIN was somewhere down the tree, its parent would need to have a larger key value, which would break the heap property. If it had a smaller value, the MIN element wouldn’t be a true minimum, which is a contradiction with our choice of MIN as the minimum element of the whole heap.

The second property isn’t so obvious, but it allows us to store the tree not as a network (graph) of nodes with edges, but as an array of numbers, where the edges can be calculated implicitly. The second property allows us to think about the binary tree as if it was a complete binary tree. There can’t be any holes (missing elements) in the middle of the tree, only at the very right edge in the last layer. To figure out how to store the tree in an array we can first ignore the missing elements in the last layer and think about the tree as if it was complete. Here’s how such tree might look:

+ +

If we write it out layer by layer in an array, we simply get:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
+

Since JavaScript arrays are 0 indexed, we can do a little trick and add a blank element to the beginning of the array, which will make every number be equal to its index. I’m using null to make it clear that the first element is not really part of the data structure and only acts to fill in space. In reality, we could use something like Uint32Array and leave the first index set to 0.

[null, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15];
+

If you look closely at the binary tree, you can see that the left child of each node has double the value of its parent, and the right child has double the value but plus one. Looking at 7 for example, the left child is 2*7 = 14 and the right child is 2*7 + 1 = 15. This property is only true when we have a complete binary tree, which is exactly why the binary heap requires all but the last layers to be full.

Now looking back at the array, we can also see something interesting. Because we know that the root is at index 1, its left child must be at index 2*1 = 2 and right child at index 2*1 + 1 = 3. This will also be true for any other element, seeing that the array copies the structure of the tree. If the left child has double the value in the tree, and the array has values mapped to their index, then the left child in the array (having double the value) will be at double the index.

We can also get rid of the initial blank element in the array by simply shifting all results off by 1 to the right, giving us 2*i + 1 and 2*i + 2. Let’s try this:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
+

The left child of 1 is 2, which we can get to by calculating 2*i + 1, where i = 0 (because 1 is at 0th index). We would get 2*0 + 1 = 1, which is the index of its left child, the number 2. Going further down, getting the right child of 2 (which is a 5) we take the index of 2 (which is 1) and plug it in the 2*i + 2 formula, giving us 2*1 + 2 = 4, which is the index of the value 5. Let’s try getting the right child of 5. We take the index, which is 4 and do 2*4 + 2 = 10, giving us the index of the right child, the number 11.

It is important to note that all the calculations are done on indexes, not the actual values. We only used numbers from 1 to 15 to make it easy to see the pattern for calculating left/right children. We could just as well do the same on a completely different binary heap, and it would work, because there isn’t any point in the computation where the value is being used, only the index.

+ +

We can now also see, that this way of calculating children will work even if we leave out a few nodes in the last layer, considering it is still filled from the left. Going the other way, it’s not hard to see that if we left out one node in the middle of the tree, suddenly all of this would stop working and we would no longer have a simple formula for calculating children.

Lastly, we need a way to navigate back up the heap, going from children to the parent. This is easy to figure out, because if left = 2*i + 1, then i = (left - 1) / 2 when going from the left child, and if right = 2*i + 2, then i = (right - 2) / 2. This gives us

$$\text{parent} = \frac{\text{left} - 1}{2} \text{or} \frac{\text{right} - 2}{2}$$

Let’s do a little renaming first, since we’re going from an index of the child, let’s call that i, giving us a more general equation

$$\text{parent} = \frac{i - 1}{2} \text{or} \frac{i - 2}{2}$$

Now considering the expression $\lfloor \frac{i - 1}{2} \rfloor$, equivalent to Math.floor((i - 1) / 2). If we had our original 2*i and 2*i + 1, then Math.floor(i / 2) would definitely work, because we’re basically getting rid of the +1 in the second case. In the more complicated 2*i + 1 and 2*i + 2 we can think of the i - 1 as moving to the simpler case, and then using the same floor function.

Navigating a binary heap in an Array

Let’s create an Array for the following heap:

+ +

We simply write down the numbers going layer by layer, left to right, getting:

// indexes: 0  1  2  3  4  5 
+var heap = [1, 3, 5, 4, 9, 6];
+
+function left(i) { return 2*i + 1; }
+function right(i) { return 2*i + 2; }
+function parent(i) { return Math.floor((i - 1) / 2); }
+
+console.log(heap[left(0)]);
+// 3
+console.log(heap[right(0)]);
+// 5
+
+// We can even use .indexOf() to make this a bit clearer
+
+console.log(heap[left(heap.indexOf(3))]);
+// 4
+console.log(heap[right(heap.indexOf(3))]);
+// 9
+console.log(heap[left(heap.indexOf(5))]);
+// 6
+

After this, it should be crystal clear that we can think in terms of a tree, but do the actual operations on an Array representing the same thing.

Heap operations

Now that we understand how to store the heap, we can take a look at the operations the heap supports, their time complexity, and how to implement them. First, here’s an overview of the supported operations:

Min(): Returns the MIN element, that is the one with the lowest key, $O(1)$.
Insert(key, value): Inserts a value into the heap under a given key, $O(\log n)$.
ExtractMin(): Returns the MIN element and removes it from the heap, $O(\log n)$.

Earlier in this article, we implemented the heap as an Array of Numbers. But in real life, we will almost always want to store actual objects in the heap, and order them by a given key. This can be done in multiple ways. We can either store an Array of objects that look something like { key: X, value: the_actual_object }, or we can use a mapping function that the heap uses to map each element to its key. For example function(user) { return user.id; } could be used to map users to their respective user.id. I’m going to assume the first example, since it makes the implementation a bit shorter, but both ways should be equally easy to implement.

Min()

Returning the MIN element is an easy operation thanks to the heap property. We already know it is present in the root, which means we simply return the element at index 0 in our array. The time complexity of this operations is $O(1)$.

function min(heap) {
+    // Note that we're expecting the `{ key: X, value: the_actual_object }` format,
+    // which is why we're returning `.value` here.
+    return heap[0].value;
+}
+

Insert(key, value)

Adding an element is rather simple. First we add it to the end of our Array, which is semantically equivalent to adding a new leaf to the last layer of the tree (as far left as possible). This however breaks the heap property, saying that the key of each node must be lower than the keys of its children. We can easily fix this by bubbling the newly added node up, checking if its key is lower than its parent, and swapping them if it is.

function insert(heap, key, value) {
+    var node = { key: key, value: value };
+
+    heap.push(node);
+    var index = heap.length - 1;
+
+    while (index > 0) {
+        var parentIndex = parent(index); // here comes the `Math.floor((i - 1) / 2)`
+
+        if (heap[index].key < heap[parentIndex].key) {
+            var tmp = heap[index];
+            heap[index] = heap[parentIndex];
+            heap[parentIndex] = tmp;
+
+            index = parentIndex;
+        } else {
+            // We can stop up-propagating since the rest of the tree already
+            // obeys the heap property and the upper nodes would have even lower keys
+            // than our direct ancestor.
+            break;
+        }
+    }
+}
+
+var heap = [];
+
+insert(heap, 2, "b");
+insert(heap, 3, "c");
+insert(heap, 1, "a");
+
+console.log(heap);
+// 0: {key: 1, value: "a"}
+// 1: {key: 3, value: "c"}
+// 2: {key: 2, value: "b"}
+

We can see that 1 is in the root as expected, 3 is as the left child, and 2 is as the right child. This is because after the first two inserts, right when we added 1 to the leaf the heap looked like this (the nodes are labeled as key[value] for simplicity):

+ +

But the insert call runs the up-propagation, it swaps 2 and 1, creating the final shape of the tree.

+ +

Adding 0 to the heap would cause yet another round of up-propagation.

var heap = [];
+
+insert(heap, 2, "b");
+insert(heap, 3, "c");
+insert(heap, 1, "a");
+insert(heap, 0, "~");
+
+console.log(heap);
+// 0: {key: 0, value: "~"}
+// 1: {key: 1, value: "a"}
+// 2: {key: 2, value: "b"}
+// 3: {key: 3, value: "c"}
+

Initially before the up-propagation the heap would look like this, breaking the heap property:

+ +

but after up-propagating the 0[~] element up we get a proper MIN-heap:

+ +

Because of the second property of a binary heap, we can think of a complete binary tree of the same depth as an upper bound on the shape of the heap. From this, we can easily derive that the depth of a complete binary tree is $\log n$. The Insert operation only traverses the tree once, going from a leaf to the root, which is $\log n$ layers. The whole Insert operation is thus also $O(\log n)$.

ExtractMin()

Removing the MIN element is easier than it might seem at first. We swap the root node with the last element on the last level, remove the last element from the Array altogether (this removes the MIN element from the heap), and then propagate the new root down to fix the heap property. We have to be a bit careful here though. While propagating up only required to compare with the parent, when propagating down we have to check against both the children, and swap with the smaller one. Why? A simple example will explain:

+ +

Let’s say a > b and a > c, which means a needs to be propagated down. If we picked one of the children at random (or always the left one for example), we could break the heap property. Here’s how the tree would look after swapping a with b.

+ +

We might have fixed the relationship between a and b, but if we also had b > c originally, we would still have a tree that does not satisfy the heap property. We can fix this easily by looking at both the children, comparing them against each other, and swapping with the smaller one. That way, we would’ve swapped a with c, making c the new MIN root. This would be fine, because c < b.

To clarify this better, let’s look at an example of doing the whole ExtractMin() operation on a small heap.

+ +

First ExtractMin() swaps the root with the last element in the last layer.

+ +

Then it removes the 1 from the heap altogether, but still having the heap property broken.

+ +

Swapping down with the smaller of the two children at the second layer. Note that this is legal, because 2 < 3.

+ +

And continuing further down, until there are no more swaps needed, or until it reachest the last layer.

+ +

Knowing how the operation works, we can implement it in a fairly straightforward way. This example also includes all of the necessary above code to make it easier to test out in the console and understand the heap as a whole.

function left(i) { return 2*i + 1; }
+function right(i) { return 2*i + 2; }
+function parent(i) { return Math.floor((i - 1) / 2); }
+
+function min(heap) {
+    // Note that we're expecting the `{ key: X, value: the_actual_object }` format,
+    // which is why we're returning `.value` here.
+    return heap[0].value;
+}
+
+function insert(heap, key, value) {
+    var node = { key: key, value: value };
+
+    heap.push(node);
+    var index = heap.length - 1;
+
+    while (index > 0) {
+        var parentIndex = parent(index); // here comes the `Math.floor((i - 1) / 2)`
+
+        if (heap[index].key < heap[parentIndex].key) {
+            var tmp = heap[index];
+            heap[index] = heap[parentIndex];
+            heap[parentIndex] = tmp;
+
+            index = parentIndex;
+        } else {
+            // We can stop up-propagating since the rest of the tree already
+            // obeys the heap property and the upper nodes would have even lower keys
+            // than our direct ancestor.
+            break;
+        }
+    }
+}
+
+function extractMin(heap) {
+    var result = min(heap);
+
+    // If there is only one element in the heap, that being the minimum, we can just clear it.
+    if (heap.length == 1) {
+        heap.splice(0, 1);
+        return result;
+    }
+    
+    // We copy the last element to the root and remove the last element.
+    // There is no need to do an actual swap as we showed in the examples above,
+    // since the last element is going to get removed immediately afterwards.
+    heap[0] = heap[heap.length - 1];
+    heap.splice(heap.length - 1, 1);
+
+    bubbleDown(heap, 0);
+
+    return result;
+}
+
+function bubbleDown(heap, index) {
+    var leftIndex = left(index);
+    var rightIndex = right(index);
+
+    var smallest = index;
+
+    if (leftIndex < heap.length && heap[leftIndex].key < heap[smallest].key) {
+        smallest = leftIndex;
+    }
+    if (rightIndex < heap.length && heap[rightIndex].key < heap[smallest].key) {
+        smallest = rightIndex;
+    }
+
+    if (index != smallest) {
+        var tmp = heap[index];
+        heap[index] = heap[smallest];
+        heap[smallest] = tmp;
+
+        bubbleDown(heap, smallest);
+    }
+}
+
+var heap = [];
+
+insert(heap, 2, "b");
+insert(heap, 3, "c");
+insert(heap, 1, "a");
+insert(heap, 0, "~");
+
+console.log(extractMin(heap));
+// "~"
+console.log(heap);
+// 0: {key: 1, value: "a"}
+// 1: {key: 3, value: "c"}
+// 2: {key: 2, value: "b"}
+
+console.log(extractMin(heap));
+// "a"
+console.log(heap);
+// 0: {key: 2, value: "b"}
+// 1: {key: 3, value: "c"}
+
+console.log(extractMin(heap));
+// "b"
+console.log(heap);
+// 0: {key: 3, value: "c"}
+
+console.log(extractMin(heap));
+// "c"
+console.log(heap);
+// []
+

Same as with the Insert case, ExtractMin also traverses the tree at most once, going from the root to a leaf, which means the time complexity is also $O(\log n)$.

Conclusion

In the beginning we mentioned that a binary heap is also often used as a priority queue. A good example here might be a task scheduler which always takes the task with the highest priority and runs it. Adding new tasks to the priority queue and extracting the highest priority one would both have $O(\log n)$ time complexity, which makes it very fast even as the number of tasks grows larger. Another thing to note, the Array based implementation has all the benefits of a tree data structure without having a bunch of objects floating around on the memory heap.

Lastly, the binary heap is not the only heap there is, even though it’s probably the most common one. Two other examples are Binomial and Fibonacci heaps, which are very different from the binary heap, and provide some very interesting time complexities on their operations. For example, inserting an element into a Fibonacci heap is $O(1)$. There are also two operations which we didn’t cover in this article, Decrease which changes the key of an element in the heap, and Merge which takes two heaps and merges them together into one. The reason we didn’t cover them is that Decrease requires some additional handling to be useful, and Merge in and of itself isn’t so common. But those two are also an area where a Fibonacci heap provides constant time complexity, while the binomial heap is only $O(\log n)$ for Decrease and $O(n)$ for Merge.

References

+ + + + +

Few notes on the Binomial and Fibonacci heaps

Tue, 23 Jan 2018 10:00:00 +0100

Having just implemented and tested a Fibonacci heap on a large dataset I thought I’d write up a bit about it, mostly so that I can reference this post later in the future, and to help me remember things I’ve learned better. Note that this blog post is not a tutorial on how to implement a Binomial/Fibonacci heap.

First, let’s begin with a few definitions. Throughout the article we’ll be talking about min-heaps. The heap property of a tree says that the value in each node is less than or equal to the value of its children. It doesn’t say which values go to the left and which go to the right, so it doesn’t help us with searching. It only tells us that the minimum of any subtree is in its root. Also note that we are not restricting ourselves just to binary trees, this property works for any kind of tree.

Binomial heap

Before we can define a binomial heap, we need to define a binomial tree. We’ll use a recursive definition.

A binomial tree of rank 0 is a single node without any children.
A binomial tree of rank k is a tree where the root has exactly k children, which are, going from left to right, binomial trees of rank 0..k-1.

To make things a little more confusing, here’s a picture from wikipedia, which uses uses a reverse order, putting children of lower rank to the right. Both definitions are equivalent.

Before we move onto the binomial heap, let us prove a small property which will be useful later. A binomial tree of rank $k$ has $2^k$ nodes. For k=0 we get $2^0 = 1$, which is true. Now taking $k>0$, we know that a binomial tree of rank $k-1$ has $2^{k-1}$ nodes. We also know, that we can use two trees of rank $k-1$ and combine them into a binomial tree of rank $k$.

If we do that, we get $2 \cdot 2^{k-1} = 2^k$ nodes in total. This shows that the number of children is logarithmic in the total number of nodes. Since increasing the rank by one increases the depth of the tree by one as well, we get that a binomial tree with $n$ nodes has the depth of $O(\log n)$ and also has $O(\log n)$ children at the root.

A small sidenote, when we merge two binomial trees, in order to preserve the heap property, we have to put the one with a higher value in its root under the one with the lower value.

Now moving onto a binomial heap, we define it as a list of binomial trees T1,…,Tk, which are sorted by their rank, each rank from 0 to $k$ occurs at most once, and each tree obeys the heap property.

The operations we want from the binomial heap are the following:

Min: Finding the minimum, which can be either obtained in $O(\log n)$ by iterating the roots, or in $O(1)$ by keeping a separate minimum pointer and updating it along the other operations.
Merge: Taking two binomial heaps and merging them together, we simply iterate both lists, looking at the same rank at a time … if both heaps contain a tree of the same rank, we merge them together, creating a tree of one rank higher. We keep doing if there’s also a tree of one rank higher, much like we would carry over 1 in binary addition. This whole operating is $O(\log n)$, as it does the same exact operations as binary addition does, which can be shown as $O(\log n)$ using basic amortized analysis.
Insert: Adding a single item into the heap. We do this by creating a singleton heap with just one element and merging it into our heap. By definition this is also $O(\log n)$.
Build: Building a binomial heap from a list of $n$ elements. Unlike in a binary heap, we can simply call Insert for each element. I won’t go into why, but the complexity is just $O(n)$, not $O(n \log n)$ as we could expect.
ExtractMin: Removing a minimum from the heap, we take the tree with the minimum value at its root, remove it from the heap, create a new heap into which we insert all of its children, and merge that heap back into our initial heap. The whole operation is again $O(\log n)$.

Lazy binomial heap

We can go a bit further and make our binomial heap lazy. This will help us improve the amortized time of some of the operations. The only change we’re going to make is allow multiple trees of the same rank to co-exist in our binomial heap.

We’ll simplify our Merge operation so that it works in constant time. Instead of doing all of those complicated operations, we simply take the both lists of trees of both heaps, and concatenate them together. This can be done in $O(1)$ when using double linked lists. (Note that the minimum pointer should be updated to the minimum of both heaps when doing this).

We also modify our ExtractMin operation so that it performs a new operation called Consolidation. This will fix our heap so that it again looks like a binomial heap. In a nutshell, we’ll do a bucket sort on the lists of trees in our heap, and then merge trees in each bucket until there’s only one left (note that when we merge two trees we create a tree of a higher rank, so we move it one bucket up). Iterating the buckets from lower to higher ranks will result in a bucket lists where each bucket contains zero or one tree. We can convert this back to a binomial heap.

The whole trick is that the consolidation itself is $O(\log n)$ (amortized), so we keep the amortized complexity of ExtractMin, but improve the complexity of Insert and Merge to $O(1)$. Also note that the worst case time of ExtractMin is $O(n)$.

Fibonacci heap

Going even further, we want our heap to also support the Decrease operation, which takes a pointer to a node and changes its key to a specific value. As doing this blindly could break the heap property, we have to do some tweaking to our data structure.

A regular binomial heap can do a Decrease in $O(\log n)$ by simply propagating the decreased element as far up to the root as needed to maintain the heap property. But our Fibonacci heap will be able to do this in just $O(1)$!.

To allow this, we tweak our definition of the binomial heap. We keep the heap ordering on our trees, but we don’t require them to be binomial. All of the above mentioned operations will be identical to the lazy binomial heap, with the exception of Decrease. We’ll also be keeping an additional flag on each node, which says if the node is marked.

When Decrease is called on a node, we check if it changes the keys in such the parent node now has a higher value. If not, we stop right here as the tree keeps the heap property. If the parent now has a higher value, we Cut the subtree at the changed note (including the changed node, acting as a root).

The Cut operation takes the subtree, removes it from its parent, and Inserts it back into the heap. If the parent was marked, we recursively call Cut on the parent. If the parent wasn’t marked, we mark it and end right there.

This means our first Decrease will simply take the subtree at the decreased note, mark its parent, and insert the subtree back into the heap. If we then call a second Decrease under the same parent node, we will end up Cutting the parent as well. This prevents the tree from becoming too degenerate.

The most interesting part here (which I’m however not going to prove), is that the amortized cost of Cut is $O(1)$, and the amortized cost of Decrease is also $O(1)$. This makes for a very interesting data structure, in which all operations except for the ExtractMin run in amortized constant time.

Conclusion

I do realize that I’ve skipped most of the amortized analysis, and simplified a few things, but this article mostly serves as a mental refresher for people already somewhat familiar with the Fibonacci heap. For a complete reference, check out the references at the Wikipedia page.

+ + +

Bloom filter in JavaScript

Thu, 18 Jan 2018 10:00:00 +0100

This article assumes basic familiarty with bit vectors. If you’re unsure how they work or need a refresher, check out the previous article about bit vectors which goes in depth both in explaining how they work, and how to implement one.

A Bloom filter is a simple yet powerful data structure. It allows you to answer the question have you seen this before? in a very fast and memory efficient way. Unlike a regular set, a Bloom filter only needs to store a few bits per item instead of storing the whole thing. The trick is, a Bloom filter will be able to tell you if something is not present in the set with 100% certainty, but if you ask it if something is present in the set, you might get a false positive. That means the response could be true, even if the item was never stored in the set.

To explain things, let’s first do a simple example. Consider we’re taking random numbers as input and checking if we’ve already seen a given number. We could use an array and store the numbers and check its contents whenever needed.

var arr = [];
+
+// inserting a few elements in the array
+arr.push(3);
+arr.push(5);
+
+// and checking for presence
+console.log(arr.indexOf(3) !== -1)
+console.log(arr.indexOf(4) !== -1)
+

arr.indexOf(...) returns the index of an element in the array if found, and -1 if the element was not present in the array.

This approach works, but it has a few issues. Firstly, indexOf needs to traverse the entire array to see if an element is present ($O(n)$ time complexity), which means the bigger the array, the longer it will take. And second, we’re also using extra memory for each element we add to the array. This might seem dumb at first, considering we haven’t gotten to the Bloom filter part, but read on to see how we can save a lot of memory with a probabilistic approach. But first, let’s try fixing the lookup time by switching to a hash table. We could use JavaScript’s builtin objects to store a truthy value.

var table = {};
+
+table[3] = true;
+table[5] = true;
+
+console.log(table[3]);
+console.log(table[4]);
+

This helps us with the lookup time to an average constant time $O(1)$. The thing is, if we were to count a lot of numbers, it would take up a lot of memory. Considering we’re only storing numbers here, you might be thinking that we could use a bit vector, which would only take 1 bit per number. If we were to store something like IPv4 addresses encoded as 32-bit integers, we would need to allocate a 128MB bit vector. That might still be feasible, as the set of possible values is still only 32 bits. Increasing this to IPv6 (which are 128-bit integers), the bit vector would be way too big (rougly $10^{36}$ bytes).

Here comes the interesting part though, what if we only intend to store a portion of the numbers. If it was small enough, we could simply store them in an array. If it was bigger, we’d probably use a hash table or ideally a set data structure. But what if we need to store lots of them?

Storing 5 million IPv6 addresses (128bit numbers) would take up around 76MB of memory. That might not seem like a huge deal if all you’re doing is managing which IPv6 address you’ve seen before. But what if this is part of a side calculation that isn’t particularly important? Or what if you need to store even more? The storage requirements grow linearly together with each address. At 50 million we’d be closing in 1GB of memory.

Using a probabilistic approach to save memory

If we’re willing to relax our requirements on the data structure, we can save quite a bit of memory. The original bit vector approach is great in terms of memory, but fails when the set of possible values is too large, as we need to pre-allocate a slot for each possible value. But what if we allowed multiple values to share the same slot?

We can use a simple hash function to calculate the position in the bit vector, which allows us to have a bit vector that is smaller than the set of possible values. If we have 100 possible values, but only 10 bits, we can use a modulo 10 hash function to set the proper bit to true.

function hash(num) {
+    return num % 10;
+}
+

This approach does decrease the amount of memory required by 10x, but it’s also easy to see that collisions in the hash function can occur rather easily. For example 1, 11, 21, 31, … all have the same hash value. At first this seems like our data structure is completely broken and can’t tell us anything useful, but something interesting happens here.

When we query the data structure, there are two cases that can happen:

The queried bit is true, which means the number we used to calculate the hash might have been used to set it, or it could have been a different number with the same hash.
The queried bit is false, which menas the number was never present in the set! There is no way a different number could be used to set any bit to false.

There is a big difference between cases 1) and 2). While the first case gives a probabilistic response with a possible false positive (getting true while the answer should be false), the second case is always 100% correct. If we get a false, the element definitely isn’t in the set!

Before we move on to how an actual bloom filter works, let’s see a few examples where a data structure that can return false positives but never returns false negatives can be useful:

Tracking which pages/articles/profiles/websites a user has seen and which he hasn’t. This way you can recommend things they definitely have not seen.
A simple web crawler where you want to avoid visiting the same pages again and again. You will 100% recognize if the crawler has not seen a particular page, with a possibility of skipping a few pages due to a false positive.
Mapping tags/keywords to large data files/database where you might want to avoid a possibly expensive search if the keyword does not match the contents.
Any case where you want a set which should not be enumerable. For example, when storing privacy sensitive data, using the approach mentioned above doesn’t anyone allow to figure out which specific elements are in the set, as there are many possibilities for each slot.

Bloom filter

In our previous example, the probability of a collision (two elements sharing the same hash) is both very high, and using such a simple hash function yields predictable collisions, which is something we’d want to avoid. The reason is, the number of collisions could be very high for a certain set of items, while being low for a different set, leading to uneven distribution of false positives among the set of possible sets of items.

First, it is important to note that the above shown hash function is not a good example of a hash function. I won’t go into the details of constructing a hash function in this article, as it’s a rather involved topic and I can’t really think of any cases where you’d need to come up with your own. For now, let’s simply consider we have three different hash functions called h1, h2 and h3.

To create a bloom filter, we simply need a bit vector and k different hash functions, and use them to calculate multiple bit indexes for each element. That way if two elements have a collision using one hash function, they don’t necessarily have a collision with the other ones, as each of the hash functions is different and returns a different index.

// Here we assume that `bitvector` is an N-bit long bit vector. See references at the end of this
+// article if you're unsure on how it works.
+function insert(bitvector, value) {
+    // We simply set all the relevant bits to `1`
+    bitvector.set(h1(value), 1);
+    bitvector.set(h2(value), 1);
+    bitvector.set(h3(value), 1);
+}
+
+function isMember(bitvector, value) {
+    var bit1 = bitvector.get(h1(value));
+    var bit2 = bitvector.get(h2(value));
+    var bit3 = bitvector.get(h3(value));
+
+    // An element is in the set only if all of the relevant bits are `1`
+    return bit1 && bit2 && bit3;
+}
+

Now the question is, does having multiple hash functions reduce the probability of a false positive? Sure the probability that three hash functions collide all at once is lower than the probability of just one hash function colliding, but we’re also setting three times as many bits.

If you’re not interested in the math behind calculating the probabilites of collisions, feel free to skip to the next section for a high level overview.

If we have a bit vector with N bits, the probability that a hash function selects a specific bit is $\frac{1}{N}$. This means, that if we do a single insert with a single hash function, the probability that a specific bit is kept at 0 is $1 - \frac{1}{N}$. If we use k different hash functions, a single bit is kept at 0 if none of the hash functions choose it, which means the above test has to pass for all of them, which means the probability is $(1 - \frac{1}{N})^k$.

If we start inserting multiple elements, the probability that a particular bit is 0 after n elements have been inserted is again calculated by looking at this as passing the above test for each of the inserted elements (done n times), so the probability is $((1 - \frac{1}{N})^k)^n = (1 - \frac{1}{N})^{kn}$.

The complement of this is $1 - (1 - \frac{1}{N})^{kn}$, which is the probability that a specific bit is 1 after having inserted n elements. We get a false positive if and only if all of the hash functions select indexes which are all 1, which means the above test needs to pass for all of them, giving us the final probability:

$(1 - (1 - \frac{1}{N})^{kn})^k \approx (1 - e^{-\frac{kn}{N}})^k$

Using a bit of analysis, we can find that the minimum is at $\frac{kn}{N} = \ln 2$, which gives us a useful relationship $k = \ln 2 \cdot \frac{N}{n}$.

Choosing the right constants and hash functions

When we want to use a Bloom filter, we don’t really start from the number of hash functions. We care about the probability of false positives and how many items we are approximately going to store in the filter. If we know those, it’s rather simple to calculate the number of bits and hash functions needed. There is actually a nice Bloom filter calculator which does exactly this, you put in a number of expected elements, your target probability for false positives, and the calculator will tell you the optimal number of hash functions, and how many bits the backing bit vector should have. I really recommend trying it out with a few different settings for p and n and see how the size of the Bloom filter changes to get a feel for how much memory you might need if you used one.

Now all that is left to do is figure out where to get those k different hash functions. A simple solution suggested by this paper suggests how to create k hash functions from just two different ones (let’s call them f and g). We calculate the i-th hash as h(i) = (f(x) + i * g(x)) % N where N is the number of bits in our bit vector.

An important point to note here is that we only need the hash function to be uniform, we don’t need a cryptographically secure hash function such as SHA-2. We can use a simpler and faster hash function, such as FNV (here and here), which simply does a few bitwise operations and multiplications. There are actually two variants, FNV-1 and FNV-1a, which are almost the same, except for the order of one particular operation. This means we can use these two hash functions as a basis for the trick mentioned above. You can take a look at the bloomfilter.js library which has an efficient implementation of both the Bloom filter, and of both variants of the FNV hash function.

References

Bit Vector in JavaScript

Wed, 17 Jan 2018 10:00:00 +0100

A bit vector (also known as bit set or bit array) is a set data structure which uses only 1 bit per element. It answers the question is X in the set?.

The main advantage is memory efficiency and speed, as the memory is allocated in a single continuous block, making it very cache friendly (unlike some tree based data structures), and requiring only a few bitwise operations to access/modify elements in the set. The disadvantage is that we need to know the size of the bit vector beforehand, and that we might be wasting some of the memory if we only store a few elements in a large vector. Let’s look at this more closely.

For simplicity, we can think of the bit vector as an array of bits. Not boolean true/false values, or bytes, but bits. Our goal is to map the set of all possible values we might want to store (also called the domain) to a unique index in the bit vector. A good example would be if we wanted a set of small integers (say for an algorithm like the prime sieve of Eratosthenes). We would then need as many bits as is the highest integer we might want to store. If the highest number is 1024, our vector would need 1024 bits, or 128 bytes to store all our membership values (flags).

As a small sidenote, you can use the Chrome developer console to run the example code. It was written in a way that you can copy paste each snippet as you go along and everything will work.

Implementation using basic `Array` and `Number` types

Let’s do a simple implementation first, using bare JavaScript arrays of Number. We can do this because the JavaScript bitwise operators treat their operands as 32 bit integers. Afterwards, we’ll do the same using the new Uint32Array type. But first, let’s use a regular Array to make things simpler.

To create a bit vector, we first need to specify the number of bits we need (which is also the number of possible values we can store membership of). The question becomes, how many 32-bit integers do we need to store N bits? The answer is unsurprisingly N / 32.

// A function which takes a number of bits and returns an initialized
+// bit vector of given length.
+function buildVector(bitCount) {
+    // The number of bits each `Number` can store.
+    var BITS_PER_ELEMENT = 32;
+
+    // Total number of `Number` values in the vector.
+    // We round up, because even if we need less than 32 bits, we need at least 1 `Number`.
+    var elementCount = Math.ceil(bitCount / BITS_PER_ELEMENT);
+
+    var vector = new Array(elementCount);
+
+    // We initialize our bit vector to all zeros
+    for (var i = 0; i < elementCount; i++) {
+        vector[i] = 0;
+    }
+
+    return vector;
+}
+

Now that we have our bit vector, all that is left to do is write the get and set methods for manipulating bit values by their respective bit index. We’ll first consider only having a single Number representing 32 bits.

Short introduction to binary

Binary numbers are represented as a sum of powers of two, for example:

$1 = 2^0$
$2 = 2^1$
$3 = 1 + 2 = 2^0 + 2^1$
$4 = 2^2$
…

If you don’t have much experience with binary, you might be tempted to think that $4 = 1 + 3 = 2^0 + (2^0 + 2^1)$, but that wouldn’t work, as we only want one of each power of two. A simple rule to achieve this is that we do a greedy approach, starting from the biggest power of two we can.

7 … we can fit a 4 into that, which means it’s $4 + 3$, and we have to convert the 3, which is $2 + 1$, resulting in $4 + 2 + 1$ or $2^2 + 2^1 + 2^0$
14 … we can fit an 8, leaving us with 6, on which we iterate the same rule and get $4 + 2$, resulting in $8 + 4 + 2$ or $2^3 + 2^2 + 2^1$

A binary number is then simply a sequence of 0 or 1, stating 1 for each power of 2 we have (going from the lowest from the right), and 0 for each one we don’t have, thus:

$7 = 2^2 + 2^1 + 2^0$ which we can write as $1 \cdot 2^2 + 1 \cdot 2^1 + 1 \cdot 2^0$, which gives us $111$
$14 = 2^3 + 2^2 + 2^1$ which we can write as $1 \cdot 2^3 + 1 \cdot 2^2 + 1 \cdot 2^1 + 0 \cdot 2^0$, which gives us $1110$ reading from the right

Note that we can add any number of zeros to the left, so 111 is equivalent to 0111 and 000000111.

A small sidestep here, we can also use hexadecimal numbers to represent binary, as the conversion is rather simple. A hexadecimal digit represents a value from 0 to 15, which is exactly what 4 bits represent. We can thus take any binary number, such as 100000101011111010 and convert it to hex (or back):

look at it as groups of 4 bits from the right 10 0000 1010 1111 1010
add leading zeros to the leftmost group if needed 0010 0000 1010 1111 1010
convert each individual group 2 0 10 15 10 to decimal
write each decimal in hex 2 0 A F A
put them back together, prefixing with 0x to get 0x20AFA

Converting hexadecimal back to binary is simple, just take these steps backwards. The reason we use hexadecimal numbers instead of binary often is because they are much easier to visually parse, understand and remember. Looking at a number like 0xA1 is much clearer than looking at 10100001, because you don’t have to count how long is each run of zeros/ones.

Bitwise operators

To manipulate individual bits, we’ll make use of a few simple bitwise operators. They are called bitwise, because they manipulate individual bits. Specifically, we’ll need:

negation (NOT), using the ~ (tilde) operator, which simply flips all the bits
conjunction (AND), using the & operator, which returns 1 when both bits are 1, otherwise 0
disjunction (OR), using the | operator, which returns 0 when both bits are 0, otherwise 1
left shift, using the << operator, which is shifting all the bits to the left, or semantically multiplying by a given power of 2.

Because these operators are bitwise, they will operate on all the bits in parallel. This is different from the more common logical operators && and || (note that they’re doubled), which operate on the whole numbers. Here are a few examples (the b suffix signifies a binary string, this is not proper JavaScript syntax and only used for demonstration purposes):

1 | 2 = 01b | 10b = 11b = 3
1 || 2 = 1, because both are converted to booleans, and both are true
1 & 3 = 01b & 11b = 01b = 2
1 && 3 = 1, because both are converted to booleans, and both are true
1 << 1 = 1b << 1 = 10b = 2, or $1 \cdot 2^1$
1 << 3 = 1b << 3 = 1000b = 8, or $1 \cdot 2^3$

To understand the NOT operator, we first need to understand that while mathematically, binary numbers are infinite (or can be), we are only working with 32-bit integers. This means if we start with 0 and do a negation ~0, we get a 32 bit number with all bits set to 1.

Because JavaScript uses two’s complement, ~0 will actually be -1 (or 11111111111111111111111111111111 in binary). This is because the Number type behaves as a signed 32-bit number, which means it also has to represent negative values. The important thing here is that two’s complement doesn’t say anything about the actual bits, it only specifies what value those bits represent when doing other mathematical operations (+, -, *, etc.). It also affects how the browser will display each number. If you want to learn more about two’s complement, check out the wikipedia article +or the following online calculator (there are many others) to get an idea for how it works.

But since our bit vector doesn’t need to do arithmetic, we don’t really need to worry about this. We might occasionally want to print out a given number in hex or binary, which can be done using the .toString function, for example (14).toString(2) outputs binary 1110 and (14).toString(16) outputs hex e.

Knowing how binary and bitwise operators work, we can finally figure out how to set a specific bit in a given 32-bit integer. The OR operator | is perfect for this, as it won’t change a 1 to 0 (and thus leaving the existing values alone), but will be able to set a 0 to 1. Counting from 0, if we want to set the 1st bit (at index 0) to 1, we simply do num | 1, as this can also be read as num | 00000000000000000000000000000001b. If we wanted to set the 2nd bit (at index 1), we’d want num | 10b, or num | 2 in decimal/hex. The 3rd bit (at index 2) would be num | 100b or num | 4, the 4th bit (at index 3) would be num | 1000b or num | 8, and so on. We’ll call the number on the right side of the operator a bit mask.

If you look closely, you can probably figure out the pattern. To set the i-th bit, we need to OR the number with $2^i$, which can be easily created with a left shift as 1 << i. The whole operation then becomes num | (1 << i). Before moving on, let’s do a more visual example. We’ll start with num = 0xDA (or 11011010, or 218 dec), and toggle the 3rd bit (index 2).

num  11011010
+mask 00000100
+OR | --------
+     11011110
+

We can also use the same operation to check if a given bit is set. As all the bits except for the i-th are zero, we can use the & operator, which will return a non-zero number if and only if the i-th bit in num is 1. The get operation is then num & (1 << i).

Lastly, we might want to remove elements from the bit vector, which means we need the ability to clear a specific bit. A small recap of the & operator.

| & | 0 | 1 |
+|---|---|---|
+| 0 | 0 | 0 |
+| 1 | 0 | 1 |
+

We can see that if we set the mask to all 1, doing num & 1111...1111b doesn’t change the num value. We also see that no matter what value is in num, if any of the bits in the mask is 0, the resulting bit will also be 0. Thus num & 11101b will set the 2nd bit from the right (index 1) to 0 and leave all of the other bits intact.

Constructing such mask is simple, since we only need to take our set mask from before and flip all the bits using the NOT operator ~. Resulting in num & (~(1 << i)). I’ve added extra parentheses to make the order of operations clear. Beware that & and | have a very low priority, so it might be a good idea to be very explicit with parens around bit operations unless you’re sure what you’re doing is correct.

Here’s a similar example as we did for OR, starting with num = 0xDA, clearing the 7th bit (index 6). We first construct the mask step by step 1 << 6 = 01000000b, followed by a negation ~01000000 = 10111111.

num   11011010
+mask  10111111
+AND & --------
+      10011010
+

Implementing `get`, `set` and `clear` on a 32-bit vector

As mentioned before, let’s first consider only a 32-bit vector stored in a single Number. The operations would be:

// Set the i-th bit to 1
+function set(vec, i) {
+    return vec | (1 << i);
+}
+
+// Clear the i-th bit
+function clear(vec, i) {
+    return vec & (~(1 << i));
+}
+
+// Return the value of the i-th bit
+function get(vec, i) {
+    var value = vec & (1 << i);
+    // we convert to boolean to make sure the result is always 0 or 1,
+    // instead of what is returned by the mask
+    return value != 0;
+}
+

Note that all of these functions return a new number as their result. Let’s test to see if it works:

// Since our bit vector is stored in a single number, we simply initialize it as 0.
+var vec = 0;
+
+vec = set(vec, 3);
+console.log("is 3 in vec? " + get(vec, 3));
+// is 3 in vec? true
+console.log("is 4 in vec? " + get(vec, 4));
+// is 4 in vec? false
+
+vec = clear(vec, 3);
+console.log("is 3 in vec? " + get(vec, 3));
+// is 3 in vec? false
+

Remember the number only has 32-bits, so don’t use an index bigger than 31. If you do, it will simply wrap around, so you’ll get set(0, 0) == set(0, 32).

Implementing `get`, `set` and `clear` on an arbitrary length bit vector

Now we’re finally ready to create the whole data structure, an arbitrary length bit vector. We need to modify our get, set, and clear to calculate the right Number within the array first, and then to do the same bit manipulation they did before.

Going again from the right, bits 0 - 31 will be stored in the 1st Number (at index 0), bits 32 - 63 at index 1, 64 - 95 at index 2, etc. From this, we can see that the index in the bigger array is simply the bit index divided by 32 and rounded down. Simply Math.floor(i / 32). This gives us the index of the Number.

To get the bit index within the number, we simply take the remainder of dividing by 32, or the modulo 32 of the original bit index. This gives us i % 32 for the bit index. Putting this together (note that since we’re using an Array, the bit vector is now mutable, unlike the previous 32-bit version using only a Number). I’ve added the original buildVector function to make it easier to copy paste this code as a whole.

// Set the i-th bit to 1
+function set(vec, i) {
+    var bigIndex = Math.floor(i / 32);
+    var smallIndex = i % 32;
+
+    vec[bigIndex] = vec[bigIndex] | (1 << smallIndex);
+}
+
+// Clear the i-th bit
+function clear(vec, i) {
+    var bigIndex = Math.floor(i / 32);
+    var smallIndex = i % 32;
+
+    vec[bigIndex] = vec[bigIndex] & (~(1 << smallIndex));
+}
+
+// Return the value of the i-th bit
+function get(vec, i) {
+    var bigIndex = Math.floor(i / 32);
+    var smallIndex = i % 32;
+
+    var value = vec[bigIndex] & (1 << smallIndex);
+    // we convert to boolean to make sure the result is always 0 or 1,
+    // instead of what is returned by the mask
+    return value != 0;
+}
+
+// A function which takes a number of bits and returns an initialized
+// bit vector of given length.
+function buildVector(bitCount) {
+    // Total number of `Number` values in the vector.
+    // Adding Math.ceil here to make sure we allocate enough space even if the size
+    // is not divisible by 32.
+    var elementCount = Math.ceil(bitCount / 32);
+    var vector = new Array(elementCount);
+
+    for (var i = 0; i < elementCount; i++) {
+        vector[i] = 0;
+    }
+
+    return vector;
+}
+

We can do a similar test as we did before to test our bit vector:

// Since our bit vector is stored in a single number, we simply initialize it as 0.
+var vec = buildVector(64);
+
+set(vec, 30);
+console.log("is 30 in vec? " + get(vec, 30));
+// is 30 in vec? true
+console.log("is 40 in vec? " + get(vec, 40));
+// is 40 in vec? false
+
+clear(vec, 30);
+console.log("is 30 in vec? " + get(vec, 30));
+// is 30 in vec? false
+

Using Uint32Array instead of an `Array` of `Number`

Modern browsers now provide a better and more efficient variant to an Array of Number, which is using Uint32Array. The difference here is that JavaScript Array is not exactly the array you would expect if you came out of a computer science class. It behaves more like a hash map with integer keys. You can also store different types in the same array, for example [1, "hello"] is completely valid JavaScript. Secondly, Number is not a 32-bit integer. According to the standard, Number is a IEEE-754 double precision float. The trick here is that the bitwise operators convert their operands to a 32-bit integer before applying the operation. The conversion is defined as an abstract operation, so it most likely comes down to how the implementation chooses to handle things.

The ideal scenario would be that the JIT (just in-time compiler) recognizes that we’re only doing bitwise operations on something that starts out as a constant zero, and thus uses a 32-bit integer as the backing store for our data, and also recognizes that the array doesn’t contain anything else, so it wouldn’t have to use a generic implementation that allows different types, but rather a continuous block of memory. While this might be possible, it’s most likely not what happens, at least not something that can be guarateed to happen 100% of the time, because the JIT would need to understand everything your code is doing to prove that such optimization is possible. The halting problem however proves that the compiler can’t understand any arbitrary code, and as such any optimization could be only based on heuristics.

This is why the Uint32Array type was added to JavaScript. While the compiler/interpreter/JIT can’t know that we only intend to use 32-bit integers, we as the programmers do know it, so we can choose a more specific data structure that allows for exactly that. Uint32Array is a type which has only one purpose, to store unsigned 32-bit integers in a continuous block of memory.

Using it is actually even simpler than what we did before, as our buildVector function turns into a one liner.

function buildVector(bitCount) {
+    // The constructor accepts a number of 32-bit integers in the array,
+    // which is simply the number of bits in our bit vector divided by 32.
+    // We also keep the `Math.ceil` just to make the API more robust.
+    return new Uint32Array(Math.ceil(bitCount / 32));
+}
+

On the outside, the Uint32Array behaves just like an Array, with the exception that the operands to the indexer [] operator get converted to unsigned 32-bit integers.

var arr = new Uint32Array(10);
+
+arr[0] = "foo";
+arr[1] = "123";
+arr[2] = 3.14;
+
+console.log(arr[0] + " " + arr[1] + " " + arr[2]);
+// 0 123 3
+

Everything else about the bit vector (set, get, and clear) will stay the same, so there isn’t really anything we’re giving up for using the more efficient Uint32Array version.

Conclusion

If you’ve read this far, you should now feel pretty confident about how the bit vector works, and be able to implement it yourself. A bit vector might not be the most popular data structure, but it can come handy in various different scenarios. A specific example could be using binary frames with the WebSocket API, in which case you might want to minimize the network traffic as much as possible. When working with binary frames, you will most certainly run into Uint32Array and bitwise operators, so at least knowing how a bit vector works can help you there. It’s also useful to know that there are other built-in array types with predefined length, such as Uint8Array, Int32Array (note the lack of U, as this is a signed integer version of a 32-bit array), Float64Array, etc. For more details on these check out the Indexed collections section under Global Objects on MDN. You might also be interested in seeing browser support of typed arrays in the modern browsers and different polyfill options.

Lastly, I’d like to note a little bit about dynamically sized bit vectors. Much like a regular array, a bit vector can also be implemented in a way that allows for resizing. In the Array variant, we would just need to push a few additional zeroed Number instances into the array to make the bit vector larger, while the Uint32Array variant would require us to allocate a new Uint32Array with larger size and copy things over. At first it might seem like the Array variant is clearly superior in this regard, but here’s a few thoughts:

if the JIT recognized our Array should use an efficient packed 32-bit integer block of memory to store the data, pushing a new element into it would do exactly the same as if we create a new Uint32Array (there could be more optimizations going on, but the same could be said for the JIT optimizing a resize of the Uint32Array variant)
if the Array is backed by a generic array of objects with extended capacity for pushing new elements into it, the push itself wouldn’t cost as much, but there could be a price paid in terms of performance of the regular set, get and clear operations

Note that this is mostly food for thought, I haven’t done any benchmarks comparing the two variants, and could be very wrong with regards what happens in actual JavaScript implementations. If I was made to guess, I’d say the Uint32Array would outperform the Array even with an occasional resize. But feel free to correct me on this in the comments.

References

Visualizing TensorFlow Graphs in Jupyter Notebooks

Tue, 30 May 2017 00:00:00 +0000

Prerequisites: This article assumes you are familiar with the basics of Python, TensorFlow, and Jupyter notebooks. +We won’t use any of the advanced TensorFlow features, as our goal is just to visualize the computation graphs.

+ +

TensorFlow operations form a computation graph. And while for small examples +you might be able to look at the code and immediately see what is going on, +larger computation graphs might not be so obvious. Visualizing the graph can help +both in diagnosing issues with the computation itself, but also in understanding +how certain operations in TensorFlow work and how are things put together.

We’ll take a look at a few different ways of visualizing TensorFlow graphs, and most importantly, +show how to do it in a very simple and time-efficient way. It shouldn’t take more than one or two lines +of code to draw a graph we have already defined. Now onto the specifics, we’ll take a look at the following +visualization techniques:

Exploring the textual graph definition
Building a GraphViz DOTgraph from that directly in the Jupyter Notebook
Visualizing the same graph in a locally running instance of TensorBoard
Using a self contained snippet that uses a cloud deployed publically available TensorBoard instance to render the graph inline in a Jupyter Notebook.

First, let us create a simple TensorFlow graph. Regular operations such as creating a placeholder with tf.placeholder will create a node in the so called default graph. We can access it via tf.get_default_graph(), but we can also change it temporarily. In our example below, we’ll create a new instance of the tf.Graph object and create a simple operation adding two variables

$$c = a + b$$

Note that we’re giving explicit names to both of the placeholder variables.

+ + +

import tensorflow as tf
+
+g = tf.Graph()
+
+with g.as_default():
+    a = tf.placeholder(tf.float32, name="a")
+    b = tf.placeholder(tf.float32, name="b")
+    c = a + b
+

+ +

The variable g now contains a definition of the computation graph for the operation $c = a + b$. +We can use the g.as_graph_def() method to get a textual representation of the graph for our expression. +While the main use of this is for serialization and later deserialization via tf.import_graph_def, we’ll use it to create a GraphViz DOTgraph.

Let us take a look at the GraphDef for our simple +expression. First, we’ll inspect the names of all of the nodes in the graph.

+ + +

[node.name for node in g.as_graph_def().node]
+

['a', 'b', 'add']
+

+ +

As expected, there are three nodes in the Graph. One for each of our variables, and one for the addition +opeartion. The placeholder variable nodes have a name since we explicitely named them when calling tf.placeholder. If we omit the name keyword argument, TensorFlow will simply generate a name on its own, as it did with the add operation.

Next, we can take a look at the edges in the graph. Each GraphDef node has an input field which specifies +of the nodes where it has edges. Let’s take a look:

+ + +

g.as_graph_def().node[2].input
+

['a', 'b']
+

+ +

As we can see, there are two edges, one to each variable. We can feed this directly into GraphViz.

+ +

Building a GraphViz DOTgraph

+ +

GraphViz is a fairly popular library for drawing graphs, trees and other graph-shaped data structures. We’ll use the Python GraphViz package which provides a nice +clean interface. We can install it directly inside a Jupyter notebook via !pip install graphviz.

The graph definition itself will be rather simple, and we’ll take inspiration from a similar piece of code in +TensorFlow itself (in graph_to_dot.py) which generates a DOTgraph file format for a given GraphDef. Unfortunately it is only available as a command line script, and as such we can’t call it directly from our code. This is why we’ll be implementing it ourselves, but don’t worry, it will only +be a few lines of code.

+ + +

from graphviz import Digraph
+
+dot = Digraph()
+
+for n in g.as_graph_def().node:
+    # Each node has a name and a label. The name identifies the node
+    # while the label is what will be displayed in the graph.
+    # We're using the name as a label for simplicity.
+    dot.node(n.name, label=n.name)
+    
+    for i in n.input:
+        # Edges are determined by the names of the nodes
+        dot.edge(i, n.name)
+        
+# Jupyter can automatically display the DOT graph,
+# which allows us to just return it as a value.
+dot
+

+ +

Now let’s wrap this in a function and try using it on a more complicated expression.

+ + +

def tf_to_dot(graph):
+    dot = Digraph()
+
+    for n in g.as_graph_def().node:
+        dot.node(n.name, label=n.name)
+
+        for i in n.input:
+            dot.edge(i, n.name)
+            
+    return dot
+

+ +

We’ll build another graph calculating the area of a circle with the formula $\pi * r^2$. +As we can see TensorFlow does what we would actually expect and links the same placeholder +to two multiplication operations.

+ + +

g = tf.Graph()
+
+with g.as_default():
+    pi = tf.constant(3.14, name="pi")
+    r = tf.placeholder(tf.float32, name="r")
+    
+    y = pi * r * r
+    
+tf_to_dot(g)
+

+ +

Using a local TensorBoard instance to visualize the graph

+ +

While GraphViz might be nice for visualizing small graphs, neural networks can grow to quite a large size. +TensorBoard allows us to easily group parts of our equations into scopes, which will then be visually +separated in the resulting graph. But before doing this, let’s just try visualizing our previous graph +with TensorBoard.

All we need to do is save it using the tf.summary.FileWriter, which takes a directory and a graph, and serializes +the graph in a format that TensorBoard can read. The directory can be anything you’d like, just make sure you point to the same directory using the tensorboard --logdir=DIR command (DIR being the directory you specified to the FileWriter).

+ + +

# We write the graph out to the `logs` directory
+tf.summary.FileWriter("logs", g).close()
+

+ +

Next, open up a console and navigate to the same directory from which you executed the FileWriter command, and run tensorboard --logdir=logs. This will launch an instance of TensorBoard which you can access at http://localhost:6006. Then navigate to the Graphs section and you should see a graph that looks like the following image. Note that you can also click on the nodes in the graph to inspect them further.

+ +

Now this is all nice and interactive, but we can already see some things which make it harder to read. For example, when we type $\pi * r^2$ we generally don’t think of the $r^2$ as a multiplication operation (even though we implement it as such), we think of it as a square operation. This becomes more visible when the graph contains a lot more +operations.

Luckily, TensorFlow allows us to bundle operations together into a single unit called scope. But first, lets take a look at a more complicated example without using scopes. We’ll create a very simple feed forward neural network with three layers (with respective weights $W_1, W_2, W_3$ and biases $b_1, b_2, b_3$).

+ + +

g = tf.Graph()
+
+with g.as_default():
+    X = tf.placeholder(tf.float32, name="X")
+    
+    W1 = tf.placeholder(tf.float32, name="W1")
+    b1 = tf.placeholder(tf.float32, name="b1")
+    
+    a1 = tf.nn.relu(tf.matmul(X, W1) + b1)
+    
+    W2 = tf.placeholder(tf.float32, name="W2")
+    b2 = tf.placeholder(tf.float32, name="b2")
+    
+    a2 = tf.nn.relu(tf.matmul(a1, W2) + b2)
+
+    W3 = tf.placeholder(tf.float32, name="W3")
+    b3 = tf.placeholder(tf.float32, name="b3")
+    
+    y_hat = tf.matmul(a2, W3) + b3
+    
+tf.summary.FileWriter("logs", g).close()
+

+ +

Looking at the result in TensorBoard, the result is pretty much what we would expect. The only problem is, TensorBoard displays it as a single expression. It isn’t immediately apparent that we meant to think about our code in terms of layers.

+ +

We can improve this by using the above-mentioned tf.name_scope +function. Let us rewrite our feedforward network code to separate each layer into its own scope.

+ + +

g = tf.Graph()
+
+with g.as_default():
+    X = tf.placeholder(tf.float32, name="X")
+    
+    with tf.name_scope("Layer1"):
+        W1 = tf.placeholder(tf.float32, name="W1")
+        b1 = tf.placeholder(tf.float32, name="b1")
+
+        a1 = tf.nn.relu(tf.matmul(X, W1) + b1)
+    
+    with tf.name_scope("Layer2"):
+        W2 = tf.placeholder(tf.float32, name="W2")
+        b2 = tf.placeholder(tf.float32, name="b2")
+
+        a2 = tf.nn.relu(tf.matmul(a1, W2) + b2)
+
+    with tf.name_scope("Layer3"):
+        W3 = tf.placeholder(tf.float32, name="W3")
+        b3 = tf.placeholder(tf.float32, name="b3")
+
+        y_hat = tf.matmul(a2, W3) + b3
+    
+tf.summary.FileWriter("logs", g).close()
+

+ +

And here’s how the resulting graph looks like, showing both a compact view of the whole network (left) and what it looks like when you expand one of the nodes (right).

+ +

Using a cloud-hosted TensorBoard instance to do the rendering

+ +

We’ll use the modified snippet from the DeepDream notebook +taken from this StackOverflow answer. It basically takes the tf.GraphDef, +sends it over to the cloud, and embeds an <iframe> with the resulting visualization right in the Jupyter notebook.

Here’s the snippet in its whole. All you need to do is call show_graph() and it will handle everything, as shown in the +example below on our previous graph g. The obvious advantage of this approach is that you don’t need to run TensorBoard +to visualize the data, but you also need internet access.

+ + +

# TensorFlow Graph visualizer code
+import numpy as np
+from IPython.display import clear_output, Image, display, HTML
+
+def strip_consts(graph_def, max_const_size=32):
+    """Strip large constant values from graph_def."""
+    strip_def = tf.GraphDef()
+    for n0 in graph_def.node:
+        n = strip_def.node.add() 
+        n.MergeFrom(n0)
+        if n.op == 'Const':
+            tensor = n.attr['value'].tensor
+            size = len(tensor.tensor_content)
+            if size > max_const_size:
+                tensor.tensor_content = "<stripped %d bytes>"%size
+    return strip_def
+
+def show_graph(graph_def, max_const_size=32):
+    """Visualize TensorFlow graph."""
+    if hasattr(graph_def, 'as_graph_def'):
+        graph_def = graph_def.as_graph_def()
+    strip_def = strip_consts(graph_def, max_const_size=max_const_size)
+    code = """
+        <script src="//cdnjs.cloudflare.com/ajax/libs/polymer/0.3.3/platform.js"></script>
+        <script>
+          function load() {\{
+            document.getElementById("{id}").pbtxt = {data};
+          }\}
+        </script>
+        <link rel="import" href="https://tensorboard.appspot.com/tf-graph-basic.build.html" onload=load()>
+        <div style="height:600px">
+          <tf-graph-basic id="{id}"></tf-graph-basic>
+        </div>
+    """.format(data=repr(str(strip_def)), id='graph'+str(np.random.rand()))
+
+    iframe = """
+        <iframe seamless style="width:1200px;height:620px;border:0" srcdoc="{}"></iframe>
+    """.format(code.replace('"', '&quot;'))
+    display(HTML(iframe))
+

+ + +

# Simply call this to display the result. Unfortunately it doesn't save the output together with
+# the Jupyter notebook, so we can only show a non-interactive image here.
+show_graph(g)
+

+ +

Fibonacci Numbers

Sat, 08 Aug 2015 00:00:00 +0000

The Fibonacci numbers are a well known recursive sequence, which is +defined as followed

f[0] = 0
+f[1] = 1
+f[n] = f[n-1] + f[n-2]
+

The question is, how can we calculate them?

The first idea and probably most intuitive way is recursively. Why? +Because the structure of the sequence itself is recursive, which means +the implementation will be very similar to our definition.

I’ll chose JavaScript as the implementation language, simply because you +can just open the developer console in your browser and paste in the +snippets to see the results immediately.

1. Straightforward recursive implementation

We can simply take our definition, add a little bit of syntax, and +voilà, we’re done.

function fib(n) {
+  if (n < 2) {
+    return n;
+  } else {
+    return fib(n - 1) + fib(n - 2);
+  }
+}
+

First thing we should test to see if this function actually works.

> [fib(0), fib(1), fib(2), fib(3), fib(4), fib(5)]
+[0, 1, 1, 2, 3, 5]
+

Everything looks nice, but what if we try to calculate a larger number? +What is the largest number that our computer will be able to calculate +using this implementation? You might be tempted to figure this out by +trial and error, but let’s try calculating this first.

By a very rough estimate, we could say that a modern computer does +around 1000 000 000 operations per second. One computer might be 10 or +100 times faster than another computer, but that won’t really bother us, +since the end result will be the same.

To get any reasonable estimate, we should first figure out the +algorithmic complexity of our little function. At first it seems it +should be linear, since to calculate fib(20) you only need fib(19) +and fib(18), and so on. Except that fib(19) will calculate fib(18) +again. We can see this more easily by visualizing it as a tree:

As you can see, we’re calling fib(n) multiple times for the same +input. Specifically, the height of the tree will be $n$, since the +calculated value decreases by $1$ on each level. If this was a balanced +binary tree, we could easily conclude that it has an exponential number +of nodes, $2^n$ to be specific, but you can already see that one of the +two branches will have fewer children. How many exactly? Let’s use a bit +of math.

We can use the same exact formula for calculating the number of nodes, +since:

the trees for both fib(0) and fib(1) have 1 node.
the tree for fib(2) has 3 nodes, since it needs to calculate +fib(0) and fib(1), which both have 1 node, and then put those two together.
the tree for fib(3) has the height 1 + fib(2), fib(1)
in general, the tree for fib(N) has exactly fib(N-1) + fib(N-2) + 1 nodes.

Given $f_0 = 0,\ f_1 = 1$ and $f_{n+2} = f_{n+1} + f_{n} + 1$, +we can see that it already grows faster than Fibonacci numbers, so if we +could simplify this and show that the Fibonacci numbers grow +exponentially, it would also mean that the number of nodes in the tree +grow exponentially. There are many ways to derive the closed form formula for Fibonacci +numbers, but here’s a link to a really nice explanation using generating +functions (the same formula can also be derived using linear algebra.) +The resulting formula is:

$$f_n = \frac{1}{\sqrt{5}} \left( \left( \frac{1 + \sqrt{5}}{2} \right)^n - \left(\frac{1 - \sqrt{5}}{2} \right)^n \right)$$

At this point we can see that the Fibonacci numbers grow exponentially, +and so will the number of nodes in the computation tree for our naive +recursive implementation.

This is where the problem comes, since a binary tree of height $n$ will +have $O(2^n)$ nodes, meaning our complexity is exponential (even though +the real complexity is something like $O(1.6^n)$, it is still +exponential.)

For fib(40) this would be roughly $10^{12}$, fib(50) would be $10^{15}$, +and so on. Even if we get a very fast computer, we wouldn’t be able to +get anywhere near fib(100).

We can already see that this algorithm is clearly bad and wouldn’t be +very practical in real life (if you ever needed Fibonacci numbers in +real life.), so let’s try to improve it.

It feels as if the algorithm should be linear. I bet that if someone +asked you to calculate the first 20 Fibonacci numbers on paper, you’d +start with $0$ and $1$, and then just iterate forward.

2. Recursive implementation with dynamic programming

Ideally we’d like to keep our simple recursive implementation while +improving the performance to a point where it’s comparable to an +iterative solution (in terms of speed.) Earlier we established that the +main bottleneck lies in the repetitive computations.

We’ll use dynamic programming to fix this, which basically introduces a +cache (or a memoization mechanism, also called a dynamic programming +table) which is used to store the intermediary result. Once we compute a +number for a specific parameter, we’ll store it in the table and never +compute it again. This way we only need to compute each number once, +landing at linear time complexity.

var table = [0, 1];
+
+function fib(n) {
+  if (typeof table[n] !== "undefined") {
+    return table[n];
+  }
+
+  table[n] = fib(n - 1) + fib(n - 2);
+  return table[n];
+}
+

Note that we don’t need to resize the array to fit the values. This is +only possible due to JavaScript’s array implementation, which behaves a +lot more like hash-maps than like arrays.

While we did speed up the algorithm significantly, we also traded +computation time for memory, as computing fib(N) will require a table of +size N to store the intermediate results. Before optimizing this +further, we can look at one more dynamic programming solution.

In general there are two ways to approach dynamic programming. One is +top-down, which is what we’ve done in the previous example, and the +second one is bottom-up, which is shown in the next snippet.

function fib(n) {
+  var table = [0, 1];
+
+  for (var i = 2; i <= n; i++) {
+    table[i] = table[i - 1] + table[i - 2];
+  }
+
+  return table[n];
+}
+

Highlighting the differences between top-down and bottom-up:

top-down generally starts at the solution, and recursively computes +all the dependencies using memoization
bottom-up starts builds up bigger solutions from smaller ones, until +it reaches the final solution

This is an obviously simple example, but it is quite useful to know both +of these approaches, as some problems are easier to solve top-down, and +some are easier to solve using the bottom-up approach.

Regardless of which approach you chose, it still uses $O(N)$ memory. +While this is not ideal if you just want to compute a single value, +having the table pre-computed might come in very handy if you’re calling +the function often to get different Fibonacci numbers. I’ll leave it as +an exercise to the reader to modify the bottom-up approach to remember +the cached values between the calls, and only compute the needed part of +the table, such that calling fib(10) and then fib(15) would compute +the first 10 Fibonacci numbers only once.

3. Iteration

Last but not least, we can get rid of the memoization table, and only +compute the n-th number. This is rather easy by modifying the bottom-up +dynamic programming approach.

function fib(n) {
+  var x = 0;
+  var y = 1;
+
+  if (n > 2) {
+    for (var i = 2; i <= n; i++) {
+      var tmp = y;
+      y = x + y;
+      x = tmp;
+    }
+  } else {
+    return n;
+  }
+
+  return y;
+}
+

The advantage of this approach is that we get the best of both worlds. +The algorithm runs in linear time and consumes only a constant amount of +memory, and there’s no recursion, so we don’t have to worry about +stack-limit-exceeded types of errors.

The downside is that if you’re going to be calculating lots of Fibonacci +numbers, it will do the work over and over again, while the dynamic +programming approaches could make use of memoization. (Note that the +iterative approach could be also classified as bottom-up dynamic +programming, but for the sake of illustration, I’m showing dynamic +programming with an explicit use of a memoization table.)

Summary

In conclusion, there’s no single best solution, and you should pick one +based on your use case. While Fibonacci numbers are a very simple +example, you can already see that there are multiple approaches to the +same problem, without a clear winner.

Thoughts on OS X Yosemite, Arch Linux and xmonad

Sun, 16 Nov 2014 00:00:00 +0000

I’ve been using OS X as my main machine for quite a while now, mainly +because I got into Ruby development, and having a Mac is just the thing +you do when you write Ruby (at least that’s what I thought back then.) +One of the main reasons why I really fell in love with the Mac is that +things just work. There is no hassle in setting up your drivers, or +connecting a printer, or getting your favorite app to work. Everything +works out of the box.

We even have homebrew, which is really nice, as long as the thing you +need has a formula that someone has tried before. This is where things +start to get a bit hairy sometimes though. Things that are popular +usually work 100%, on the other hand, programs which aren’t in the core +repertoire of a Mac developer either don’t have have a formula at all, +or it is broken and/or outdated (I know there are things that work +wonderfully on the Mac that don’t work so well on other platforms, but +that’s not the point here.) To put this in another words, as long as I +was doing what everyone else was doing I really enjoyed the Mac.

One thing I really admired about the Mac is that everything was designed +to be perfect. The key word here is was. We got a whole new user +interface with the OS X Yosemite update, which I actually do like a lot, +but we also got a huge number of buggy and incomplete things. There are +parts of the UI that are clearly broken, especially in the dark skin. +Handoff and continuity works only when it wants to work, and when it +does, it is really slow. The most annoying thing is that when somebody +calls me on my phone, and the Mac start ringing, and I pickup the phone +because I have it close, the Mac keeps ringing loudly for another 3-5 +seconds, making it impossible to hear the person on the phone. While +this might seem really minor, it gets so annoying that I turned off the +feature after 2-3 phone calls. There are more things like this that are +tiny and broken, it just isn’t the Apple it used to be.

This is when I decided that my next computer isn’t going to be a Mac.

Choosing the right Linux distro

I’ve always been using Linux on the side, mostly because you can get a +tiny 13" Lenovo for 1/3rd of the price of a Macbook Air (Macbook Air +starts at about $1500 in my country, while the Lenovo I’m typing on +right now costed about $500.) I always just installed Ubuntu, since +that’s the thing that works.

The thing is that I never really liked Ubuntu itself. It’s an OK +distribution, and I would recommend it to anyone who isn’t familiar with +Linux, just because you can get it working really quickly and there are +no surprises on the way.

But I don’t want to be a casual Linux user anymore, I want to customize +everything based on my needs. I don’t want to use 95% of the apps that +come installed with Ubuntu, not even Gnome. The reason why I used it is +because it installed in almost a one click install, but that’s a poor +reason to choose a distribution.

This is where I made the choice to go with Arch Linux. I tried it once a +few years ago, but it didn’t really stick back then, becuase my +mentality was to install everything and make it look like a Mac, which +obviously didn’t work because I was lacking the Mac apps.

But now I think I finally understand the philosophy that one should +follow when using a distribution like Arch Linux.

xmonad

Choosing a window manager was probably the easiest decision. At first I +thought about not using a GUI at all, and just live in a tmux session, +but that wouldn’t really work with the web-based development that I do +these days, so I just grabbed the next closest thing to tmux, which is +xmonad.

It’s been only a few days, but I can already feel the power. Just being +able to hit the keyboard once and have a terminal pop up instantly is +an amazing feature. I’m not sure if it’s the terminal emulator I’m +using, or if it’s xmonad, but opening a new terminal is really, really, +really fast. If I didn’t have such a slow ~/.zshrc it would open as +fast as I let go of the keyboard, but right now there is about 100ms +delay (yeah I’m gonna have to optimize my zsh.) This might not seem like +a big deal, but actually being able to open a terminal at any time, type +one command, and immediately close it is amazing. I can be browsing +the web and see something I want to try, and without grabbing the mouse +just open a split terminal window and re-write the command from the web +page, then close the terminal and keep browsing, all without ever +touching the mouse.

I’m still waiting to try this on a bigger screen than my tiny 13", but +I’m pretty sure that once I run this on a big screen, I won’t be able to +go back to using a regular ol’ OS X (yes there are xmonad-like things +for OS X, but they don’t really work in my testing.)

Parsing CSS with Parsec

Sun, 10 Aug 2014 00:00:00 +0000

This article is a small introduction to +Parsec, the Haskell parser +combinator library for Haskell. We’ll use it to parse simple CSS file +such as the following.

.container h1 {
+  color: rgba(255, 0, 0, 0.9);
+  font-size: 24px;
+}
+

First we need to figure out our data structure which will represent the +syntax tree. Since this is just an introduction, we’ll go easy and +ignore features like media queries.

In order to create the structure we need to figure out how to name +things. We can look at the grammar definition for CSS +2.1 to figure out how things +are named, from which we can tell that the main unit is a ruleset, has +a selector and a list of declarations. Let’s call it a rule +instead of a declaration to keep things short. Each rule then has a +property and a value.

type Selector = String
+data Rule = Rule String String deriving Show
+data Ruleset = Ruleset Selector [Rule] deriving Show
+

Basic parsec combinators

The way that parsec work is that you build up small parsers and combine +them into bigger ones. We could write a parser for a rule, such as +color: red;, which would first parse a property, then a colon, then +some optional spaces and finally a value with an optional semicolon at +the end.

Here are some basic parsers from the Parsec library.

char - Parses a single character.
string - Parses an arbitrary string.
optional - Takes a parser and makes it optional.
many - Takes a parser for a single item and makes it into a parser for 0 to N items.
many1 - Same as many, only that it requires at least one.
letter - Parses any letter.
digit - Parses a digit.

To parse a colon we could do char ':'. If that colon was optional, we +can just combine it with the optional combinator, such as optional (char ':'), and so on.

Here’s how a simple parser for Rule could look like.

import Text.Parsec
+import Text.Parsec.String
+
+rule :: Parser Rule
+rule = do
+    p <- many1 letter
+    char ':'
+    optional (char ' ')
+
+    v <- many1 letter
+    optional (char ';')
+  
+    return $ Rule p v
+

You might have already noticed that Parser is indeed a Monad, which is +why we’re using the do notation, and why we are able to combine many +small parsers together. This is where the power of Parsec comes in, +because it is very easy to combine small parsers to build something that +you can use in the real world.

Now comes the time to test our parser. Parsec defines a function called +parse, which accepts a parser, a source name and a source string, and +returns Either a ParseError if our parsing failed, or the parsed +value.

λ> import Text.Parsec
+λ> parse (char ':') "test parser" ":"
+Right ':'
+λ> parse (char ':') "test parser" "a"
+Left "test parser" (line 1, column 1):
+unexpected "a"
+expecting ":"
+λ> parse (many1 letter) "test parser" "hello"
+Right "hello"
+

We might also need to say something like parse any number of letters or +digits. This is where the <|> combinator comes in, which allows us to +say the or part. It takes two parsers as arguments and returns a new +parser, which tries to parse with the first one, and if it fails tries +the second one.

λ> parse (many1 $ letter <|> digit) "test parser" "hello123"
+Right "hello123"
+

When parsing fails

There is one thing very important to understand here. As the parsers try +to parse the input, they consume it. If you use <|> to combine two +parsers together, and the first parser fails after already consuming +some input, the second parser will continue where the first one left +off. Here’s an example that illustrates this.

λ> parse (string "hay" <|> string "hoy") "test parser" "hoy"
+Left "test parser" (line 1, column 1):
+unexpected "o"
+expecting "hay"
+

We’re trying to parse either "hay" or "hoy", giving it an input of +"hoy". It seems that this should obviously succeed, but it doesn’t, +because the first parser consumes the first character "h" and then it +fails.

We could write rewrite this in another way.

λ> parse (char 'h' >> (string "ey" <|> string "oy")) "test parser" "hoy"
+Right "oy"
+

If we used a do block we could’ve actually achieved the end result of +parsing the whole "hoy" string, but there’s a much easier way, by +using try.

Using try on any parser makes it backtrack when it fails while +consuming a part of the input.

λ> parse (try (string "hay") <|> string "hoy") "test parser" "hoy"
+Right "hoy"
+

Note that we only need to use try with our first parser, since there +is nothing left to do if the second parser fails. try doesn’t affect +how the parser works, only what happens when it fails.

The bottom line here is, if you’re combining together multiple parsers +where one could consume some input and then fail, use try. There are +cases when you don’t need to do this, such as when we did letter <|> digit. Since those two parsers don’t overlap in their domain, we don’t +need to use try there.

The reason why parsec behaves this way is simply because of performance. +Careless usage of try can make the parser slower, but since we’re just +trying to understand how things work, we don’t need to worry about this.

The CSS parser

Before we move on any further, let’s improve our original rule parser. +We didn’t really account for spaces, since CSS rules can be indented and +there can be arbitrary number of spaces after the :, and values can +have more than just letters, such as #FFF.

We can use the spaces parser which skips zero or more whitespace characters.

rule :: Parser Rule
+rule = do
+    p <- many1 letter
+    char ':'
+    spaces
+
+    v <- many1 letter
+    char ';'
+  
+    return $ Rule p v
+

Next we need to figure out how to tell parsec that we also want things +other than just letters in the property value. We could use oneOf to +name all of the symbols we’d like to accept, such as oneOf "#()%", +which parses one character out of the given set. But to keep things +simple, let’s just say that the value can be anything but a ;. We +can use the noneOf combinator for that. We’ll also make the ; +non-optional to save ourselves some trouble, and accept any number of +whitespace after the whole rule definition.

rule :: Parser Rule
+rule = do
+    p <- many1 letter
+    char ':'
+    spaces
+
+    v <- many1 (noneOf ";")
+    char ';'
+    spaces
+
+    return $ Rule p v
+

Let’s test this out.

λ> parse rule "css parser" "background: #fafafa;"
+Right (Rule "background" "#fafafa")
+λ> parse rule "css parser" "background: rgba(255, 255, 255, 0.3);"
+Right (Rule "background" "rgba(255, 255, 255, 0.3)")
+

We can even try parsing multiple rules at once using many1.

λ> parse (many1 rule) "css parser" "background: rgba(255, 255, 255, 0.3); color: red;\nborder: 1px solid black;"
+Right [Rule "background" "rgba(255, 255, 255, 0.3)",Rule "color" "red",Rule "border" "1px solid black"]
+

Now moving onto the parser for a whole ruleset. Let’s do the same thing +as we did with values and say that a selector can be any character +except for {. Next we have the {, followed by any number of spaces, +followed by a list of rules, followed by a closing }.

ruleset :: Parser Ruleset
+ruleset = do
+    s <- many1 (noneOf "{")
+    char '{'
+    spaces
+
+    r <- many1 rule
+    char '}'
+    spaces
+
+    return $ Ruleset s r
+

Let’s test this out.

λ> parse ruleset "css parser" "p { color: red; }"
+Right (Ruleset "p " [Rule "color" "red"])
+λ> parse ruleset "css parser" "p { background: #fafafa;\n color: red; }"
+Right (Ruleset "p " [Rule "background" "#fafafa",Rule "color" "red"])
+

And everything seems to be working properly. You might notice that our +selector is being parsed as "p " instead of just "p". This is +because we were too relaxed on our definition, but that’s easy to fix. +But first let’s do a bit of refactoring.

Refactoring the parser using Applicative

Because the Parser monad is also an instance of Applicative, we can +use a lot of the combinators that Applicative gives us to cleanup our +code. The most useful ones are *> and <*, where *> takes two +parsers, runs the first one, throws away the result, then runs the +second one and returns its result (exactly the same as >> does for +monads). <* does the same thing, but the other way around, here are a +couple of examples.

The reason why I’m hiding the <|> in the import here is because we +need the definition from Parsec, not from Applicative.

λ> import Control.Applicative hiding ((<|>))
+λ> parse (spaces *> string "hello") "test parser" "  hello"
+Right "hello"
+λ> parse (char '(' *> string "hello" <* char ')') "test parser" "(hello)"
+Right "hello"
+λ> parse (char ';' <* spaces) "test parser" ";     "
+Right ';'
+λ> parse (char ';' <* spaces) "test parser" "; \n\n"
+Right ';'
+

As you can see the usage is pretty straightforward. We could write the +same thing using a do notation, but using the Applicative combinators +make the code easier to read once you get used to them. You can think of +them as pointing in the direction of the result.

Here’s how we could refactor our rule parser.

rule :: Parser Rule
+rule = do
+  p <- many1 letter <* char ':' <* spaces
+  v <- many1 (noneOf ";") <* char ';' <* spaces
+
+  return $ Rule p v
+

We can even define a helper that would take a parser and apply <* spaces to it, since we’re using that quite a lot, but this is just a +matter of taste.

paddedChar c = char c <* spaces
+
+rule :: Parser Rule
+rule = do
+  p <- many1 letter <* paddedChar ':'
+  v <- many1 (noneOf ";") <* paddedChar ';'
+
+  return $ Rule p v
+

Let’s do the same thing for our ruleset parser.

ruleset :: Parser Ruleset
+ruleset = do
+    s <- many1 (noneOf "{")
+    r <- paddedChar '{' *> many1 rule <* paddedChar '}'
+
+    return $ Ruleset s r
+

Now that we have refactored everything, it’s time to make the selector +parsing more strict. The new parser will be defined as a sequence of +characters consisting of letters, numbers, dots and hashes, separated by +spaces`.

selector :: Parser String
+selector = many1 (oneOf ".#" <|> letter <|> digit) <* spaces
+

Then the actual selector for the ruleset will be just many of our +selector parsers in a row, separated by spaces. We’ll use the sepBy1 +combinator for this, which takes a parser specifying the separator and +returns a list of parsed values.

λ> parse (selector `sepBy1` spaces) "test parser" ".container h1 "
+Right [".container","h1"]
+

Now that we’ve succesfully parsed the selector, we can combine it back +into a single string using the unwords function from prelude.

ruleset :: Parser Ruleset
+ruleset = do
+    s <- selector `sepBy1` spaces
+    r <- paddedChar '{' *> many1 rule <* paddedChar '}'
+
+    return $ Ruleset (unwords s) r
+

And let’s test this once again to make sure everything works.

λ> parse ruleset "css parser" ".container h1 { color: red; }"
+Right (Ruleset ".container h1" [Rule "color" "red"])
+

As you can see, our selector now doesn’t contain the trailing spaces.

Closing thoughts

The parser we developed in this article is far from complete, but feel +free to extend it to support things like pseudo classes, comments, etc.

While it’s not so common to do TDD in Haskell, I’d recommend writing a +lot of unit tests for your parser. It’s easy to play around in the REPL +and test things out, but once you start composing multiple parsers +together it gets very tedious to have to check different versions of the +string you’re parsing every time you make a change. Unlike in regular +Haskell code you can’t really rely on the type system that much, since +you’re just working with strings.

Lens Tutorial - Stab & Traversal (Part 2)

Wed, 06 Aug 2014 00:00:00 +0000

In the first article in the series about lenses, +we’ve looked at the motivation behind the lens library, and we also +derived the basic type of Lens s a.

In this article we’ll go deeper and explain the reasoning beheind the more +generic Lens s t a b type. We’ll also take a look at how we can get a multi +focus lens using a Traversal.

Just to reiterate, here’s how looks the type we derived in the previous +article.

type Lens s a = forall f. Functor f => (a -> f a) -> s -> f s
+

What we’ll do here is further generalize it so that we can change the type of +the focus.

type Lens s t a b = forall f. Functor f => (a -> f b) -> s -> f t
+

Now you might be thinking that four type parameters is a bit much, but bear +with me here. If we compare the our Lens s t a b to something like fmap, we +can see a bit resemblance there.

λ> :t fmap
+:: Functor f => (a -> b) -> f a -> f b
+

Much like a function a -> b can be applied on f a to change it’s +structure to become an f b. In the same way a Lens s t a b allows us to +change a to b, which changes the shape of s to t. We can also read it +as: A lens allows us to look at a inside an s, and if we can also replace +the a with a b, which will make the s into t. Here’s a simple example +using tuples.

λ> :t ("hello", "world")
+:: (String, String)
+λ> :t over _1 length ("hello", "world")
+:: (Int, String)
+λ> over _1 length ("hello", "world")
+(5,"world")
+

Initially we started out with s :: (String, String) and ended up with t :: (Int, String) by applying a String -> Int function on the first element of +the tuple. The specific type of the _1 lens in this case would be Lens (String, String) (Int, String) String Int.

It’s important to understand that all of the derivations we made for Lens s a +still hold for Lens s t a b, since it’s just a bit more generic. In fact you +can write the following (as it is done in the lens library.)

type Lens' s a = Lens s s a a
+

I’ll leave it as an exercise to the reader to go through all of the steps we +did previously and use Lens s t a b instead.

Traversal - the multi foci lens

Disclaimer: When I say list I really mean Data.Traversable, however +using a list makes things easier to understand. I also wrote an article on +Traversable if you’re unfamiliar with +it.

While the lenses we’ve established so far are useful, they do have their +shortcomings. One example are nested lists, let’s see an example.

data User = User String [Post] deriving Show
+data Post = Post String deriving Show
+

Now if I give you a list of users and ask you to give me all of the names of +their posts, you’ll probably not be very happy about that. Not that it’s +difficult, but some work involved.

With Traversal and traverse we can focus on all elements of a list and do +this in a single step. But first, let’s define us some lenses to work with the +types. In a real world application we’d use Template Haskell to generate the +lenses automatically, but for the sake of exercise let’s do it manually here.

posts :: Lens' User [Post]
+posts f (User n p) = fmap (\p' -> User n p') (f p)
+
+title :: Lens' Post String
+title f (Post t) = fmap Post (f t)
+

We got two lenses, one that focuses on User’s posts, and another one for the +post’s title. Let’s also define us some test data to play around with.

users :: [User]
+users = [User "john" [Post "hello", Post "world"], User "bob" [Post "foobar"]]
+

Now lets open up GHCi, load these definitions file, import Control.Lens and +see what we can do.

λ> view (traverse.posts) users
+[Post "hello",Post "world",Post "foobar"]
+

This seems to do what we want, we gave it a list of users and pulled out a list +of posts. Note that we used traverse every time the current focus was a +list, which is just in the first step on users.

The next step is to go deeper to fetch the post title. If you look at the type +of our current lens traverse.posts, you’ll see that it focuses on [Posts].

λ> :t traverse.posts
+:: (Traversable t, Applicative f) => 
+   [Post] -> f [Post]) -> t User -> f (t User)
+

In order to reach out to each post, we need to use traverse again. You can +think of traverse as something that allows us to focus on multiple targets at +once, in a similar way that map allows us to apply a function to all elements +of a list.

We started with [User].
We can’t directly apply the posts lens, since that requires a User.
traverse changes to focus on User inside the [User].
traverse.posts now works, since our target is just a User, so we can compose to get a lens of traverse.posts.

It is also important to note here that the lens composition works backwards +than what is usual in Haskell. You can think of it as a sort of object accessor +notation in an object-oriented language, where you’d do foo.bar.baz.

Just to make this point crystal clear, here’s how function composition works +for regular functions. The *2 gets applied before the +1.

λ> ((+1).(*2)) 1
+3
+

With lenses it goes the other way and the traverse goes before the posts +lens.

Traversing deeper and deeper

Our previous example worked out just as we wanted, so let’s try to go deeper +and actually fetch the title of each Post from our users list.

λ> view (traverse.posts.traverse.title) users
+"helloworldfoobar"
+

Huh? This isn’t what we wanted at all! Lens must be completely broken?!!?1!

Much like we got [Post] from traverse.posts, it would make sense to get +[String] from traverse.posts.traverse.title, but instead we got one big +String with all of the titles combined. In order to understand why this is +happening we need to look more closely at how traverse works.

Here’s a simpler example that we can use to reproduce what we had previously.

λ> view traverse ["hello", "world"]
+"helloworld"
+

The reason for this behavior is that if we use view together with traverse +it will use the Monoid instance of our focus and smash them together.

Let’s see how this works by inlining the definition of view.

view :: Lens s a -> s -> a
+view ln s = getConst $ ln Const s
+

Inlining the arguments we get the following.

λ> view traverse ["hello", "world"]
+"helloworld"
+λ> getConst $ traverse Const ["hello", "world"]
+"helloworld"
+

We can already see that it is not the lens library that does the magic, it’s +the traverse combined with Const. The view just picks the Const +applicative to be used with the traverse function.

Now moving on to inlining definition of traverse, which for a list look like +following.

traverse _ [] = pure []
+traverse f (x:xs) = (:) <$> f x <*> traverse f xs
+

Since this is a recursive function and our list has two elements, we need to +inline it in multiple steps.

-- Inlined the arguments into the definition.
+(:) <$> Const "hello" <*> traverse f ["world"]
+-- First recursive call to traverse inlined.
+(:) <$> Const "hello" <*> ((:) <$> Const "world" <*> traverse f [])
+-- Second recursive call to traverse inlined.
+(:) <$> Const "hello" <*> ((:) <$> Const "world" <*> pure [])
+

This whole expression will return a type of Const String [a], from which we +need to extract the String using getConst, as shown above.

λ> getConst $ (:) <$> Const "hello" <*> ((:) <$> Const "world" <*> pure [])
+"helloworld"
+

As you can see we’re still getting the same result as in the case of view traverse ["hello", "world"], which means we’re on the right track. But this +still doesn’t explain why are the two strings being concatenated together.

Const as a Monoid

To understand the concatenation we need to take a look at how the Applicative +instance for Const is implemented, but let’s think about this first.

Const a b acts as an Functor that pretends to contain a value of type +b, but in reality hides a value of type a. That’s why if we have Const Int String and fmap a function of type String, we’ll get a Const Int Int, +even though there was no actual value for String.

λ> let a = Const 3 :: Const Int String
+λ> :t a
+:: Const Int String
+λ> :t fmap length a
+:: Const Int Int
+λ> getConst $ fmap length a
+3
+

If you’re having trouble understanding this, check out my first article on +Lenses +which explains this in a bit more detail.

Now we’re faced with the problem of implementing an Applicative instance. The +problem being that Applicative defines pure :: a -> f a, which takes a +value and lifts it into the Applicative. But because we’re working with +Const, there is no actual value being lifted, as in the case of a Functor +where we didn’t really apply the function.

Const Int String does not contain any String, it only contains the +Int. That’s why if we do pure 3 to get back a Const String Int, we must +throw away the 3 and somehow create a String to hide it into the Const. +We need to have a way to create a value for the type we’re hiding. But how do +we do that when we have nothing?

We use a Monoid and mempty!

instance Monoid m => Applicative (Const m) where
+    pure _ = Const mempty
+

We just throw away the argument to pure and create a new Const hiding the +value returned by mempty, which for a String in our previous example would +be "".

λ> getConst $ (pure 3 :: Const String Int)
+""
+

Next up is the definition of <*>, which is rather simple now that we know +that our hidden value is a Monoid. The way that <*> works is that it takes +two Applicatives and smashes them together. In a general case it would mean +applying the function in the first one to the value in the second one, but +because our Const is just pretending to have a function while it has none, we +do not need to apply it. We just need to find a way to combine our two hidden +monoidal values, which is exactly where mappend will come to play.

We simply extract the hidden values and mappend them together to create a new +Const.

instance Monoid m => Applicative (Const m) where
+    pure _ = Const mempty
+    Const f <*> Const x = Const (f `mappend` x)
+

Intuition behind `view traverse`

Finally we can get back to our traverse example and understand why it does +what it does. We ended up with the following expression.

(:) <$> Const "hello" <*> ((:) <$> Const "world" <*> pure [])
+

With the recently gained knowledge we can see that it doesn’t matter what +function we apply to our Const. In this case it is (:) but it might as well +be undefined.

λ> getConst $ undefined <$> Const "hello"
+"hello"
+

This means that the whole (:) <$> has absolutely no meaning. It’s just there +so that our Const "hello" can take on a type of a function application, so +that we can use <*>. In fact the only thing that does something is the <*> +combinator, which calls mappend on the hidden values, but let’s take this +step by step.

First we replace pure [] with the actual value it returns in this case.

(:) <$> Const "hello" <*> ((:) <$> Const "world" <*> Const "")
+

Next we can evaluate the expression in the parentheses, which if you look at +our definition of Const will just reduce to the following.

Const $ "world" `mappend` ""
+

Which evaluates to just Const "world". Now we’re left with the following.

(:) <$> Const "hello" <*> Const "world"
+

Which again just ends up being:

Const $ "hello" `mappend` "world"
+

Which evaluates to Const "helloworld". Our initial expression applied +getConst to the result of this expression, which would just yield +"helloworld".

λ> getConst $ Const $ "hello" `mappend` "world"
+"helloworld"
+

There we go, now we have a full understanding of why view traverse requires +the traversed values to be a Monoid.

In the next article we’ll focus on some other use cases for traverse and how +to use it with combinators like toListOf, etc.

Foldable and Traversable

Wed, 30 Jul 2014 00:00:00 +0000

Before we can get into the more advanced topics on Lenses, it is +important to really understand both Foldable and Traversable, which +is the motivation behind this article.

Let’s begin with Foldable. Foldable represents structures which can +be folded. What does that mean? Here are a few examples:

Calculating the sum of a list.
Calculating the product of a list.
Folding a tree to get a maximum value.

We can describe a fold as taking a structure and reducing it to a +single result. That’s also why some languages have a reduce +function instead of a fold, even though they mean the same thing.

It is important to really understand the concept behind a fold in +general, not in terms of specific functions like foldl or foldr. +Whenever you see the word fold in a function name, think reducing a +larger structure to a single result.

Now comes the time to take a look at the Foldable type class.

class Foldable t where
+    fold    :: Monoid m => t m -> m
+    foldMap :: Monoid m => (a -> m) -> t a -> m
+
+    foldr   :: (a -> b -> b) -> b -> t a -> b
+    foldr'  :: (a -> b -> b) -> b -> t a -> b
+
+    foldl   :: (b -> a -> b) -> b -> t a -> b
+    foldl'  :: (b -> a -> b) -> b -> t a -> b
+
+    foldr1  :: (a -> a -> a) -> t a -> a
+    foldl1  :: (a -> a -> a) -> t a -> a
+

We won’t go into detail on all of these, since foldl, foldr, +foldl', foldr', foldl1 and foldr1 work the same as their +counterparts from Data.List.

What is interesting here is that fold and foldMap require the +elements of the Foldable to be Monoids. Let’s just quickly take a +look at what a Monoid is.

class Monoid a where
+    mempty  :: a
+    mappend :: a -> a -> a
+    mconcat :: [a] -> a
+

Nothing really special here, Monoid just simply defines a zero element +via mempty and an associative operation mappend for combining two +Monoids into one. mconcat is just a convenience method which has a +default implementation using mappend.

mconcat :: [a] -> a
+mconcat = foldr mappend mempty
+

fold and foldMap

The interesting thing about fold and foldMap is that they use a +Monoid instead of a function to give us the final result. This might +not be obvious at first, but by picking the right Monoid it is +essentialy the same as passing in a function, since it will just use the +mappend defined for that Monoid instance.

One very very very important aspect to understand here is that it is the +fold function that requires the elements of Foldable to have a +Monoid instance, while Foldable itself does not have that +restriction.

The result of this is that we can have something like [Int], where the +[] is a Foldable, but Int is not a Monoid, though as long as we +don’t use any of the functions from Foldable that require a Monoid +we’ll be OK. Here’s an example

λ> foldr1 (+) [1,2,3,4]
+10
+λ> fold ["hello", "world"]
+"helloworld" -- Strings are Monoids using concatenation
+λ> fold [1,2,3,4]
+<interactive>:1:1:
+    No instance for (Monoid a0) arising from a use of ‘it’
+

See how the problem only arises when we used fold with Int. We could +however wrap those Ints in a Monoid such as Sum or Product and +fold them then.

λ> fold [Sum 1, Sum 2, Sum 3, Sum 4]
+Sum {getSum = 10}
+

This might seem tedious at first, but remember our Foldable type +class, as it also defines a function that is perfect for this particular +use case: foldMap :: Monoid m => (a -> m) -> t a -> m. We can read +this as Given a foldable containing things that aren’t Monoids, and a +function that can convert a single thing to a Monoid, I’ll give you back +a Monoid by traversing the foldable, converting everything to Monoids +and folding them together.

Here’s our previous example, but now using foldMap.

λ> foldMap Sum [1,2,3,4]
+Sum {getSum = 10}
+

If you think about this for a little while we might even implement +fold in terms of foldMap. Why? When using foldMap we need to +provide a way to convert each item to a Monoid, but if those items +already are Monoids, we don’t need to do any conversion!

fold :: Monoid m => t m -> m
+fold xs = foldMap id xs
+

Here’s the same in more steps.

λ> :t id
+:: a -> a
+λ> :t fold
+:: (Monoid m, Foldable t) => t m -> m
+λ> :t foldMap
+:: (Monoid m, Foldable t) => (a -> m) -> t a -> m
+λ> :t foldMap id
+:: (Monoid m, Foldable t) => t m -> m
+

The actual Foldable type class requires either foldMap or foldr, +but for the sake of this article we won’t be looking into foldr.

Traversable

Now that we have an understanding of Foldable we can move on to +something more fun, Traversable. Traversable represents data +structures which can be traversed while perserving the shape. This is +why there is no filter or concatMap, since Traversable only +defines a way to move through the data structure, but not a way to +change it.

class (Functor t, Foldable t) => Traversable t where
+    traverse  :: Applicative f => (a -> f b) -> t a -> f (t b)
+  sequenceA :: Applicative f => t (f a) -> f (t a)
+

If you look in the documentation for +Traversable +you might note that there is also mapM and sequence, but we won’t be +covering those in this article, since their implementation isn’t +interesting and can be done mechanically.

This might look a little intimidating at first, but don’t worry, we’ll +do this step by step by implementing a Traversable instance for a +list.

instance Traversable [] where
+     traverse f xs = _
+

Since the implementation will be recursive we first need to define the +base case for our recursion, which will be the empty list. The type that +we’re looking for is f [b], but because the list we’re traversing is +empty, we just need to wrap it in the Applicative context.

instance Traversable [] where
+    traverse _ [] = pure []
+    traverse f (x:xs) = _
+

Next goes the actual recursive implementaion. We have a function f :: a -> f b and a head of the list which has the type a. The only thing we +can do at this point is apply the function.

instance Traversable [] where
+    traverse _ [] = pure []
+    traverse f (x:xs) = f x
+

This won’t typecheck of course, because we’re returning f b instead of +f [b]. We could cheat here a little bit and just try to apply a some +function f b -> f [b] to get the result. We can use (:[]) which has +a type of a -> [a] and fmap it on what we have.

instance Traversable [] where
+    traverse _ [] = pure []
+    traverse f (x:xs) = fmap (:[]) (f x)
+

Now we have an implementation that type checks, but it is still wrong, +since it doesn’t satisfy the rule that a traversal must not change the +shape of the structure it is traversing, and here we are just dropping +the rest of the list. We need to find a way to use recursion and somehow +combine the results.

By looking at the type of traverse :: (a -> f b) -> t a -> f (t b), or +in our case specifically traverse :: (a -> f b) -> [a] -> f [b] we can +see that using traverse recursively on the tail of the list would give +us the type we need.

instance Traversable [] where
+    traverse _ [] = pure []
+    traverse f (x:xs) = (f x) _ traverse f xs
+

Now we have two values, one of type f b and one of type f [b], which +are basically the head and the tail of the list, both wrapped in an +Applicative context. We also have a function (:) :: a -> [a] -> [a], +which concatenates a head and a tail together into a single list.

Knowing all of this it just comes down to a basic use of Applicative +where we have a function of two arguments and need to apply it to two +values in the Applicative context. We can do this in two different +ways.

instance Traversable [] where
+    traverse _ [] = pure []
+    traverse f (x:xs) = (:) <$> f x <*> traverse f xs
+

And an alternative definition using liftA2.

instance Traversable [] where
+    traverse _ [] = pure []
+    traverse f (x:xs) = liftA2 (:) (f x) (traverse f xs)
+

It should be pretty clear now that we need the Applicative to be able +to actually implement traverse. If all we had was a Functor we +wouldn’t be able to combine the f b and f [b] together.

sequenceA

Now that we have traverse we can move on to define sequenceA. Here’s +a specific type for our list instance.

sequenceA :: Applicative f => [f a] -> f [a]
+

If you’re familiar with sequence :: Monad m => [m a] -> m [a] from +Control.Monad then you can see how these two functions are doing the +same thing. It simply takes the Applicative effects, runs them and +pulls them out of the list.

The implementation is really simple. Starting out with an empty list, we +just need to wrap it in the Applicative context.

sequenceA [] = pure []
+

Next comes the actual recursive implementaiton. If we pattern match on +the head and the tail of the list, we’ll yet again get f a and [f a].

sequenceA (x:xs) = _
+

We can call sequenceA recursively on the tail to get f [a].

sequenceA [] = pure []
+sequenceA (x:xs) = sequenceA xs
+

But of course this isn’t good enough. We need a way to combine the head +and the tail while they’re both wrapped in an Applicative context. +This can be done in the same way as we did previously with traverse, +using (:) and the Applicative functions <$> and <*>.

sequenceA [] = pure []
+sequenceA (x:xs) = (:) <$> x <*> sequenceA xs
+

Or alternatively using liftA2 again.

sequenceA [] = pure []
+sequenceA (x:xs) = liftA2 (:) x (sequenceA xs)
+

That’s it, we have a working implementation for sequenceA.

Implementing sequenceA with traverse and vice versa

If we now look at our implementations for traverse and sequenceA we +can definitely see some similarity there.

traverse _ [] = pure []
+traverse f (x:xs) = (:) <$> f x <*> traverse f xs
+
+sequenceA [] = pure []
+sequenceA (x:xs) = (:) <$> x <*> sequenceA xs
+

The only difference is that traverse takes a function and applies it +to the head of the list, while sequenceA simply uses the head as it +is. Knowing this we can actually define sequenceA using traverse and +the id function.

sequenceA :: (Traversable t, Applicative f) => t (f a) -> f (t a)
+sequenceA xs = traverse id xs
+

Could we do the same thing the other way around though? Yes! We most +certainly can define traverse by using sequenceA and the fact that +every Traversable is also a Functor. Let’s take this step by step.

traverse :: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)
+traverse f xs = _
+

We only have one way of applying our function a -> f b to the t a +and that is using fmap, which would give us t (f b).

traverse :: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)
+traverse f xs = _ $ fmap f xs
+

Now we’ll get an error saying that we need a function t (f b) -> f (t b), which is exactly what sequenceA does!

traverse :: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)
+traverse f xs = sequenceA $ fmap f xs
+

Traversable with default implementations

Given the two implementations we just got we can rewrite our initial +Traversable type class to use those as a default implementation for +both functions.

class (Functor t, Foldable t) => Traversable t where
+    traverse :: Applicative f => (a -> f b) -> t a -> f (t b)
+    traverse f xs = sequenceA $ fmap f xs
+
+    sequenceA :: Applicative f => t (f a) -> f (t a)
+    sequenceA xs = traverse id xs
+

This is actually how it’s done in the Data.Traversable module, except +that if you look at the source code you’ll see the functions defined in +point free style.

class (Functor t, Foldable t) => Traversable t where
+    traverse :: Applicative f => (a -> f b) -> t a -> f (t b)
+    traverse f = sequenceA . fmap f
+
+    sequenceA :: Applicative f => t (f a) -> f (t a)
+    sequenceA = traverse id
+

Default implementation for Functor and Foldable using Traversable

It might not be so obvious at first, but a Traversable is a very +powerful concept. So powerful that it actually allows us to define both +Functor and Foldable if we have just a single function from +Traversable. The Data.Traversable module defines two functions, +fmapDefault and foldMapDefault, which can be used as an +implementation for fmap and foldMap if we so desire.

fmapDefault :: Traversable t => (a -> b) -> t a -> t b
+foldMapDefault :: (Traversable t, Monoid m) => (a -> m) -> t a -> m
+

The way we’re going to implement these is very similar to what we did in +the Lens introduction article. +If this section is too hard for you to understand I recommend reading +the Lens article first and then come back here. Everything will make a +lot more sense.

Let’s first compare the types of traverse and fmap.

fmap :: Functor f => (a -> b) -> f a -> f b
+traverse :: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)
+

The difference is that the function passed to traverse returns a value +wrapped in Applicative context, and the result is also wrapped. If we +could find a way to wrap the value after we apply the function, and then +unwrap it at the end, we would get exactly the same type as fmap.

We can use the Identity functor to do this, which defines a way to +unwrap it using runIdentity :: Identity a -> a.

fmapDefault :: Traversable t => (a -> b) -> t a -> t b
+fmapDefault f x = _
+

We don’t have that many options here. To be able to give the function +f to a traverse we need to change it’s type from a -> f a. That’s +where Identity comes in.

fmapDefault :: Traversable t => (a -> b) -> t a -> t b
+fmapDefault f x = traverse (Identity . f) x
+

Now our types don’t align, since we are supposed to return t b but we +are returning Identity (t b). The solution here is the above mentioned +runIdentity which simply unwraps the value.

fmapDefault :: Traversable t => (a -> b) -> t a -> t b
+fmapDefault f x = runIdentity $ traverse (Identity . f) x
+

And once more in point free style.

fmapDefault :: Traversable t => (a -> b) -> t a -> t b
+fmapDefault f = runIdentity . traverse (Identity . f)
+

Compare this to the definition of over and you can see how it looks +and feels almost exactly the same.

over :: Lens s a -> (a -> a) -> s -> s
+over ln f = runIdentity . ln (Identity . f)
+

We’ll explain how this relates to Lenses in more detail in a followup +article, but for now let’s move on to foldMapDefault.

Implementing foldMapDefault

This part is very hard to understand, so be careful.

If we compare the type of foldMapDefault with traverse we can yet +again see some similarity.

foldMapDefault :: (Traversable t, Monoid m) => (a -> m) -> t a -> m
+traverse :: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)
+

The difference from fmapDefault is that now we need a way to convert +each element of the Traversable to a Monoid.

We will use the Const applicative here, which as it so happens also +defines a Monoid instance.

foldMapDefault :: (Traversable t, Monoid m) => (a -> m) -> t a -> m
+foldMapDefault f x = _
+

As previously we can only use traverse together with a function a -> f b, but we have a -> m, where by using Const we can do the m -> f b.

foldMapDefault :: (Traversable t, Monoid m) => (a -> m) -> t a -> m
+foldMapDefault f x = traverse (Const . f) x
+

Again we’re faced with the problem of having Const m (t b) instead of +m, which can be solved using getConst.

foldMapDefault :: (Traversable t, Monoid m) => (a -> m) -> t a -> m
+foldMapDefault f x = getConst $ traverse (Const . f) x
+

And a point free version.

foldMapDefault :: (Traversable t, Monoid m) => (a -> m) -> t a -> m
+foldMapDefault f = getConst . traverse (Const . f)
+

This is also very similar to one of the functions Lens provides, in +particular view.

view :: Lens s a -> s -> a
+view ln = getConst . ln Const
+

Implementing Functor and Foldable with Traversable

Now that we understand how both fmapDefault and foldMapDefault work, +we can use them to define a Functor and a Foldable instance for any +Traversable we might have.

We can test this out by defining a simple list type.

data List a = Nil
+            | Cons a (List a)
+            deriving Show
+
+instance Functor List where
+    fmap = fmapDefault
+
+instance Foldable List where
+    foldMap = foldMapDefault
+
+instance Traversable List where
+    traverse _ Nil = pure Nil
+    traverse f (Cons x xs) = fmap Cons (f x) <*> traverse f xs
+

We used fmap = fmapDefault and foldMap = foldMapDefault to define +our Functor and Foldable instances, which is all made possible by +also having a Traversable instance. Let’s test this out to make sure +it works!

λ> traverse (\x -> Just (x + 1)) (Cons 1 (Cons 2 (Cons 3 Nil)))
+Just (Cons 2 (Cons 3 (Cons 4 Nil)))
+λ> fold (Cons "hello" (Cons "world" Nil))
+"helloworld"
+λ> fmap (+1) (Cons 1 (Cons 2 (Cons 3 Nil)))
+Cons 2 (Cons 3 (Cons 4 Nil))
+

It might be a surprising, but everything works as it is supposed +to.

Building Monad Transformers - Part 1

Tue, 22 Jul 2014 00:00:00 +0000

In this article we’ll focus on building our own monad transformers. +We’ll start out with an example code and improve it by building a simple +wrapper over IO (Maybe a).

The following example is really simple, but I’m sure you can imagine +doing something similar in your own application. The findById method +is there just to simulate a database query that might not find a result.

data User = User deriving Show
+
+findById :: Int -> IO (Maybe User)
+findById 1 = return $ Just User
+findById _ = return Nothing
+
+findUsers :: Int -> Int -> IO (Maybe (User, User))
+findUsers x y = do
+    muser1 <- findById x
+
+    case muser1 of
+        Nothing -> return Nothing
+        Just user1 -> do
+            muser2 <- findById y
+
+            case muser2 of
+                Nothing -> return Nothing
+                Just user2 -> do
+                    return $ Just (user1, user2)
+

While there’s nothing bad about case statements with pattern matching +I’m sure we can all agree that this approach can easily blow out of +proportions.

One solution that won’t work all the time might be to fetch both of the users +at the same time, which would allow us to make use of the Maybe monad. If our +findById function didn’t do any side effects, we could’ve written this.

findById :: Int -> Maybe User
+findById 1 = Just User
+findById _ = Nothing
+
+loadUsers :: Maybe (User, User)
+loadUsers = do
+    user1 <- findById 1
+    user2 <- findById 2
+    return (user1, user2)
+

Because Maybe is implemented in a way that it stops evaluating when it hits +on Nothing we get the behavior we intended without pattern matching. If one of +our findById fails to return a user, the whole function will return a +Nothing.

Unfortunately the act of finding a user needs to reach out to the real world, +which forces the IO monad upon us, making this approach impossible. We +somehow need to be able to teach IO the notion of failure.

Wrapping `IO` in `MaybeIO`

Let’s introduce a new monad which will simply wrap our IO computations into a +Maybe.

data MaybeIO a = MaybeIO { runMaybeIO :: IO (Maybe a) }
+

The next step is to make MaybeIO into a Monad, which will allow us to use +it inside a do block, but first things first. The next version of GHC (7.10) +will require every Monad to also be an Applicative, which also means that +every Monad must be a Functor. We’ll follow this an start out with a +Functor instance.

instance Functor MaybeIO where
+    fmap f m = undefined
+

We’ll use type holes to hint us in while implementing these instances. First +let’s recap the type of fmap, which is (a -> b) -> f a -> f b, which means +we have a function f :: a -> b and a functor value m :: f a, or +specifically m :: MaybeIO a.

Before we can do anything to the m we need to unwrap MaybeIO to get to the +insides. We’ll use pattern matching to do that since it’s more concise +than using runMaybeIO.

instance Functor MaybeIO where
+    fmap f (MaybeIO m) = undefined
+

We only have two things available to us, the function f :: a -> b +which only works on the type a, and the fact that both Maybe and +IO are also Functor instances, which means we can use fmap to +reach deep into the Maybe (IO a) to apply our function f to get the +result.

Here comes a little trick, since fmap can also be thought of as (a -> b) -> (f a -> f b). If we compose fmap with fmap, it gives us exactly what we +need, a way to reach two functors deep to apply a function.

λ> :t fmap.fmap
+:: (Functor f, Functor g) => (a -> b) -> f (g a) -> f (g b)
+

Substituting our types we get the following.

fmap.fmap :: (a -> b) -> (IO (Maybe a) -> IO (Maybe b))
+

We are not there quite yet, let’s see what happens if we use this +approach to implement the Functor instance.

instance Functor MaybeIO where
+    fmap f (MaybeIO m) = (fmap.fmap) f m
+
+-- Couldn't match type ‘Maybe’ with ‘MaybeIO’
+

We’re returning the wrong type! The original value passed in was MaybeIO a +and we’re returning IO (Maybe b) instead of MaybeIO b. Let’s add a type +hole to make this crystal clear.

instance Functor MaybeIO where
+    fmap f (MaybeIO m) = _ $ (fmap.fmap) f m
+
+-- Found hole ‘_’ with type: Maybe (IO b) -> MaybeIO b
+

Now remember how in the beginning we said we’ll be wrapping the IO (Maybe a) +into a MaybeIO? We can do that using the constructor of MaybeIO!

instance Functor MaybeIO where
+    fmap f (MaybeIO m) = MaybeIO $ (fmap.fmap) f m
+

There you go, a Functor instance for MaybeIO.

`Applicative` instance for `MaybeIO`

The next step is to implement an Applicative instance for our MaybeIO +wrapper. Here’s how the Applicative class looks in case you forgot.

class Applicative m where
+    pure :: a -> m a
+    (<*>) :: m (a -> b) -> m a -> m b
+

In terms of our MaybeIO the types would look as following.

pure :: a -> MaybeIO a
+(<*>) :: MaybeIO (a -> b) -> MaybeIO a -> MaybeIO b
+

Implementing pure is simple, we just need to wrap a given value into a +minimal context. Since both Maybe and IO are an instance of Applicative, +we can use their pure much as we used fmap when implementing the Functor instance (don’t forget to import Control.Applicative.)

instance Applicative MaybeIO where
+    pure = MaybeIO . pure . pure
+

We could’ve also written this more explicitly using Just instead of pure +for wrapping the value in a Maybe.

instance Applicative MaybeIO where
+    pure = MaybeIO . pure . Just
+

But moving on, now comes the hard part, implementing <*>. This is probably +the hardest part of the whole article, so don’t worry if it seems a bit +complicated. First we need to pattern match to get rid of the MaybeIO +wrapper, and then we also need to wrap the value on the right hand side in the +last step.

instance Applicative MaybeIO where
+    pure = MaybeIO . pure . Just
+    MaybeIO f <*> MaybeIO m = MaybeIO $ _
+
+-- Found hole ‘_’ with type: IO (Maybe b)
+

The type hole tells us that we need to somehow get to a IO (Maybe b) with the +given IO (Maybe (a -> b)) and IO (Maybe a). This seems like a typical +reach into a box/context and apply a function kind of problem, and it is, but +we do need to do something which isn’t so apparent at first.

Both Maybe and IO are an instance of Applicative, which means we need to +somehow use <*> to apply the boxed function to the boxed value (pardon me for +saying boxed here, but it just seems like the right analogy here.)

The problem is that we can only use <*> to apply a function nested one level +deep, since the type is m (a -> b) -> m a -> m b. Knowing that <*> is a two +argument function, meaning we can’t use simple ., we need to look into the +documentation for Applicative and find the function liftA2, works just like +fmap on functors, but for two argument functions.

λ> :t liftA2
+:: Applicative f => (a -> b -> c) -> f a -> f b -> f c
+

If we combine these two together we do get exactly what we need, a function +which takes two arguments, where first one is a function nested in two +applicatives, and a value, and applies the function to that value.

λ> :t liftA2 (<*>)
+:: (Applicative f, Applicative g) =>
+     f (g (a -> b)) -> f (g a) -> f (g b)
+

Let’s substitute our types once again to see how this exactly matches to what +we need.

liftA2 (<*>)
+:: IO (Maybe (a -> b)) -> IO (Maybe a) -> IO (Maybe b)
+

We already have both of the arguments of the correct types, which means we can +just apply the function to them and get our instance.

instance Applicative MaybeIO where
+    pure = MaybeIO . pure . Just
+    MaybeIO f <*> MaybeIO m = MaybeIO $ liftA2 (<*>) f m
+

In the next step we’ll move onto implementing the Monad instance. Make sure +you understand what we’ve done so far.

`Monad` instance for `MaybeIO`

Now comes the final step that we’ve been waiting for, implementing a Monad +instance for our MaybeIO wrapper. As we did before, here’s how the Monad +class looks.

class Monad m where
+    return :: a -> m a
+    (>>=) :: m a -> (a -> m b) -> m b
+

We can already see that return will be exactly the same as pure for our +Applicative, so let’s do that first.

instance Monad MaybeIO where
+    return = pure
+

Next comes the implementation of >>= or bind. First the initial structure

instance Monad MaybeIO where
+    return = pure
+    MaybeIO m >>= f = MaybeIO $ _
+

We have a value of type m :: IO (Maybe a) and a function that we need to +apply to the inner a which has a type f :: a -> MaybeIO b. We can use >>= +to get to the value inside the IO monad.

instance Monad MaybeIO where
+    return = pure
+    MaybeIO m >>= f = MaybeIO $ m >>= \x -> _
+

This leaves us with x :: Maybe a, which is just one pattern match away from +the final solution.

instance Monad MaybeIO where
+    return = pure
+    MaybeIO m >>= f = MaybeIO $ m >>= \x -> case x of
+        Nothing -> return $ Nothing
+        Just val -> runMaybeIO $ f val
+

A very important thing to note here is that in the case of Just val we +need to unwrap the MaybeIO using runMaybeIO. One might think that we +could instead write it like this.

instance Monad MaybeIO where
+    return = pure
+    MaybeIO m >>= f = m >>= \x -> case x of
+        Nothing -> MaybeIO $ return $ Nothing
+        Just val -> f val
+
+-- Couldn't match type ‘IO’ with ‘MaybeIO’
+

The problem here is that m >>= \x -> ... must have a return value of +IO, but we’re trying to return MaybeIO. This is why we need to +unwrap the result of f val and then wrap it again after doing >>=, +as we did in the previous example.

Using `MaybeIO` to cleanup our initial example

We manage to build ourselves a monad which combines the effects of IO and +Maybe together, which means we can use it to represent IO computations +which can fail. This is perfect for our initial example which uses findById :: Int -> IO (Maybe User).

Since the type of our computation is MaybeIO we need to wrap the findById +function to make use of the monad instance for MaybeIO.

smartFindUsers :: Int -> Int -> MaybeIO (User, User)
+smartFindUsers x y = do
+    user1 <- MaybeIO $ findById x
+    user2 <- MaybeIO $ findById y
+
+    return (user1, user2)
+

We can even go one step further and keep the original return value of +findUsers IO (Maybe (User, User)) by unwrapping the MaybeIO.

smartFindUsers :: Int -> Int -> IO (Maybe (User, User))
+smartFindUsers x y = runMaybeIO $ do
+    user1 <- MaybeIO $ findById x
+    user2 <- MaybeIO $ findById y
+
+    return (user1, user2)
+

Now let’s go ahead and test this in GHCi to make sure we didn’t break anything.

λ> smartFindUsers 1 1
+Just (User,User)
+

Our new version works exactly the same as the old one, but without the +necessary error handling boilerplate. Much like monads allow you to capture +control flow patterns, you can use monad transformers to add additional control +flow to your existing monads without sacrificing readability of your code.

The next step is to make our MaybeIO into an actual transformer by swapping +IO for any Monad.

Generalizing `MaybeIO` to `MaybeT`

The real monad transformers you’ll encounter in the world of Haskell are a bit +more generic than the one we just implemented. Instead of hard-coding the IO +monad we’ll pass it in as a type parameter, resulting in the following +definition of MaybeT.

newtype MaybeT m a = MaybeT { runMaybeT :: m (Maybe a) }
+

There aren’t any significant changes, we just introduced a new type parameter +which will be the monad we’re wrapping. Since everything else remains almost +exactly the same, I’ll just show the Monad implementation here.

newtype MaybeT m a = MaybeT { runMaybeT :: m (Maybe a) }
+
+instance Monad m => Monad (MaybeT m) where
+    return = MaybeT . return . Just
+    MaybeT m >>= f = MaybeT $ do value <- m
+                                 case value of
+                                     Nothing -> return Nothing
+                                     Just x -> runMaybeT $ f x
+

The only notable thing here is that our type parameter m is restricted to be +a Monad as well, since we’re only going to be wrapping monads.

Our findUsers function will be exactly the same, we’ll just need to swap +runMaybeIO for runMaybeT.

transformerFindUsers :: Int -> Int -> IO (Maybe (User, User))
+transformerFindUsers x y = runMaybeT $ do
+    user1 <- MaybeT $ findById x
+    user2 <- MaybeT $ findById y
+
+    return (user1, user2)
+

Just to make it crystal clear what’s going on here, the function without using +runMaybeT would look as follows.

wrappedFindUsers :: Int -> Int -> MaybeT IO (User, User)
+wrappedFindUsers x y = do
+    user1 <- MaybeT $ findById x
+    user2 <- MaybeT $ findById y
+
+    return $ Just (user1, user2)
+

We can even introduce a type alias to have something called MaybeIO using the +MaybeT transformer.

type MaybeIO a = MaybeT IO a
+

This is actually how the well known monads such as Reader, Writer and +State are defined. They’re just type synonyms for the respective transformers +using the Identity monad.

type Reader r = ReaderT r Identity
+type Writer w = WriterT w Identity
+type State s = StateT s Identity
+

If you’re interested in learning more about the Identity monad and how it can +be used in some more advanced settings, take a look at my Introduction to +Lenses article +where it’s explained step by step in great detail.

This concludes the first article in the series on Monad Transformers. Next time +we’ll take a look at how we can stack one transformer onto another and +introduce the MonadTrans and MonadIO type classes.

Mutable State in Haskell

Sun, 20 Jul 2014 00:00:00 +0000

Haskell is a purely functional language, which means there are no side-effects +and all variables are immutable. But as you probably know this isn’t +completely true. All variables are indeed immutable, but there +are ways to construct mutable references where we can change what the +reference points to.

Without side effects we wouldn’t be able to do much, which is +why Haskell gives us the IO monad. In a similar manner we have many ways to +achieve mutable state in Haskell, let’s take a look at them:

IORef
STRef in the ST monad
MVar
TVar in Software Transactional Memory (STM)

IORef

We all know that the IO monad allows us to do arbitrary effects in the real +world, so it probably comes as no surprise that it also allows us to create a +mutable reference to an type, called IORef (from Data.IORef.) There is not +much complicated about IORef, as it only takes a single type parameter, which +is the type it’s going to contain.

Before we move into specifics it is important to note here that modifying the +IORef is no a pure operation, which means ever single operation on the +IORef will be inside the IO monad.

Let’s take a look at some of the functions available for manipulating IORefs.

data IORef a
+
+newIORef    :: a -> IO (IORef a)
+readIORef   :: IORef a -> IO a
+writeIORef  :: IORef a -> a -> IO ()
+modifyIORef :: IORef a -> (a -> a) -> IO ()
+

First thing you’ll probably notice is that in order to create an IORef we +need to give it a value. An IORef must always contain a value of a given +type, it is impossible to create it empty. Here’s a simple example.

import Data.IORef
+
+main :: IO ()
+main = do
+    ref <- newIORef (0 :: Int)
+
+    modifyIORef ref (+1)
+
+    readIORef ref >>= print
+

I’ve used 0 :: Int instead of just 0 to make it explicit that we’re using +Ints. If you don’t do that it won’t affect the program but you might get a +warning from the compiler.

There’s not much really happening in this example, we just create a new +IORef, increase it’s value by 1 and then print the result. While this is +nice it doesn’t really show much, so let’s make this more complicated.

A common pattern in Haskell is to take an immutable data structure and put it +inside a mutable reference, which basically gives you a mutable version of that +data structure (let’s ignore the fact that there might be a more efficient way +to do this for now.) This will work because we can take any Haskell type and +put it into an IORef. Let’s begin by using Maybe Int to represent a +mutable box for an Int which can be empty.

magic :: IORef (Maybe Int) -> IO ()
+magic ref = do
+    value <- readIORef ref
+
+    case value of
+        Just _ -> return ()
+        Nothing -> writeIORef ref (Just 42)
+
+main :: IO ()
+main = do
+    ref <- newIORef Nothing
+    magic ref
+
+    readIORef ref >>= print
+

First we define a function which takes a IORef (Maybe Int), that is a +mutable reference that maybe contains an Int and produces some side effects. +The implementation simply reads the IORef and do nothing if it already has a +value, but if it contains Nothing it will replace that value with Just 42. +Our main function then simply prints the contents of the IORef, which is +Just 42.

In-place bubble sort with `IORef`

If you’ve read this far there’s a fair chance that you know how bubble sort +works. The important thing about it is that it works in-place and modifies the +array it is sorting. Here’s a simple implementation in Ruby.

def bubble_sort(list)
+  list.each_index do |i|
+    (list.length - i - 1).times do |j|
+      if list[j] > list[j + 1]
+        list[j], list[j + 1] = list[j + 1], list[j]
+      end
+    end
+  end
+end
+

The key part being here is that we’re swapping the elements of the list as we +iterate through it. This is something we can’t do in pure Haskell, but we can +attempt to do this using IORefs.

We will use a simple Haskell list where each element is IORef Int, so that we +can move them around. The exact type will be [IORef Int].

Disclaimer: I am aware that using a list, which is a linked list, is a +horribly inefficient implementation. The point of this article is however to +show how IORef can be used, not how to properly sort an array.

Our sorting function will accept a plain list of Ints, wrap them all in +IORefs, do the sorting in place, and unwrap the IORefs to return a list of +Ints again.

bubbleSort :: [Int] -> IO [Int]
+bubbleSort input = do
+    let ln = length input
+
+    xs <- mapM newIORef input
+
+    forM_ [0..ln - 1] $ \_ -> do
+        forM_ [0..ln - 2] $ \j -> do
+            let ix = xs !! j
+            let iy = xs !! (j + 1)
+
+            x <- readIORef ix
+            y <- readIORef iy
+
+            when (x > y) $ do
+                writeIORef ix y
+                writeIORef iy x
+
+    mapM readIORef xs
+

Let’s go through the code one step at a time. First we need to calculate the +length of the list being sorted and bind that to a variable.

let ln = length input
+

Next we wrap all of the items in the list inside an IORef. This will allow us +to do the sort in-place by swapping around the values of the references.

xs <- mapM newIORef input
+

Let’s examine the mapM here a little bit. The newIORef function has a type +of a -> IO (IORef a), if we try to partially apply it with map, we’ll get +back the following.

λ> :t map newIORef
+:: [a] -> [IO (IORef a)]
+

This is not very useful for us, since we need a [IORef a]. Fortunately +Haskell provides a sequence :: [IO a] -> IO [a] function which simply pulls +out the monadic effects from a list.

λ> :t sequence . map newIORef
+:: [a] -> IO [IORef a]
+

mapM is simply defined a shorthand for as sequence . map. There also exists +forM which is exactly like mapM, but the arguments are swapped around.

λ> :t mapM
+mapM :: Monad m => (a -> m b) -> [a] -> m [b]
+λ> :t forM
+forM :: Monad m => [a] -> (a -> m b) -> m [b]
+

One last variant is mapM_ and forM_, which the same as mapM and forM, +only their return value is discarded.

λ> :t mapM_
+mapM_ :: Monad m => (a -> m b) -> [a] -> m ()
+λ> :t forM_
+forM_ :: Monad m => [a] -> (a -> m b) -> m ()
+

We chose forM because the function we pass in as an argument is quite long +and it just ends up being syntactically more pleasing, and because we only care +about the effects produced by the function we apply. [0..ln - 2] simply +allows us to call the function length - 2 number of times.

forM_ [0..ln - 2] $ \_ -> do
+    forM_ [0..ln - 2] $ \j -> do
+

Next we extract two items from the list, note that these have the type IORef Int.

let ix = xs !! j
+let iy = xs !! (j + 1)
+

We need to read the values from the IORefs in order to be able to compare them

x <- readIORef ix
+y <- readIORef iy
+

and then simply swap the contents if x > y

when (x > y) $ do
+    writeIORef ix y
+    writeIORef iy x
+

The last step is to unwrap the IORefs.

mapM readIORef xs
+

Now that we went through each of the steps, let’s test our bubble sort implementation.

λ> bubbleSort [1,2,3,4]
+[1,2,3,4]
+λ> bubbleSort [4,3,2,1]
+[1,2,3,4]
+λ> bubbleSort [4,99,23,93,17]
+[4,17,23,93,99]
+

It works! Keep in mind that this implementation is horribly slow. If you’re +interested in fast arrays in Haskell check out the vector +library.

ST monad

You’ve probably noticed that the only reason why we need to perform our sorting +algorithm in the IO monad is to have mutable references, which is not ideal +since we’re not really doing any IO.

Luckily for us there is a solution called the state thread monad. I won’t be +going on into great detail since the API for IORef and STRef is almost +exactly the same.

data STRef s a
+
+newSTRef    :: a -> ST s (STRef s a)
+readSTRef   :: STRef s a -> ST s a
+writeSTRef  :: STRef s a -> a -> ST s ()
+modifySTRef :: STRef s a -> (a -> a) -> ST s ()
+

The key difference is that while we can’t ever escape from the IO monad, we +do have the ability to escape from the ST monad with the runST :: ST s a -> a function, making the computation pure.

import Control.Monad.ST
+import Data.STRef
+
+magic :: Int -> Int
+magic x = runST $ do
+    ref <- newSTRef x
+
+    modifySTRef ref (+1)
+
+    readSTRef ref
+

The only thing worth mentioning here compared to the IORef example is that +the type of the function magic is just Int -> Int, because we’re able to +escape the ST monad using a call to runST.

If you’re not sure why this is useful, think of the sorting algorithm we +developed earlier. There are many algorithms which require mutation, but which +are also pure in their nature. If the way to achieve mutation was using the +IO monad, we wouldn’t be able to implement such algorithm in pure code.

MVar

The next type we’re going to take a look at is a little bit more complicated +than IORef, it’s called an MVar. As usual most of the API is similar, but +there is one huge difference. While an IORef must always have a value, MVar +can be empty.

We have two ways of constructing an MVar.

newMVar :: a -> IO (MVar a)
+newEmptyMVar :: IO (MVar a)
+

We also have an additional operation takeMVar :: MVar a -> IO a which takes a +value out of an MVar and leaves it empty. Now comes the important part, if +we try to do takeMVar from an empty MVar, it will block the thread until +someone else puts a value into the MVar. The same thing happens when you +try to putMVar into an MVar that already has a value, it will block until +someone takes that value out.

Try compiling and running the following program.

import Control.Concurrent
+
+main :: IO ()
+main = do
+    a <- newEmptyMVar
+    takeMVar a
+

After a second or so you’ll get an exception and the program will crash.

*** Exception: thread blocked indefinitely in an MVar operation
+

The reason for this is that there are no other threads that could possibly +modify the MVar, so the runtime kills the thread. If we modify the program to +first put a value into the MVar it will work correctly.

main :: IO ()
+main = do
+    a <- newEmptyMVar
+    putMVar a "hello"
+    takeMVar a >>= print
+

Now you might be thinking, how does the runtime know that there are no other +threads that could put a value into that MVar? Using garbage collection!

Every MVar knows which threads are currently blocked on it. If a thread that +is currently blocked on an MVar is not accessible from any other running +thread, it will get killed since there is no way it to become unblocked.

If you’re interested in more details about this I recommend reading the amazing +Parallel and Concurrent Programming in +Haskell book, +specifically the chapter on how blocked MVars are +handled.

Synchronizing threads using `MVar`

One of the great benefits of MVars is that they can be be used to serve as +synchronization primitives for communication between threads.

We can use them as a simple 1 item channel, where we fork a thread that forever +loops trying to read from the MVar and print the result, and in the main +thread we read input from the user and put it into the same MVar.

import Control.Monad
+import Control.Concurrent
+
+main :: IO ()
+main = do
+    a <- newEmptyMVar
+
+    forkIO $ forever $ takeMVar a >>= putStrLn
+
+    forever $ do
+        text <- getLine
+        putMVar a text
+

Everything will work as expected since takeMVar will block until we put +something into the MVar.

One important thing to note here is that when main returns the runtime +automatically kills all of the other running threads. It doesn’t wait for them +to finish. Let’s see a simple example.

import Control.Monad
+import Control.Concurrent
+
+main :: IO ()
+main = do
+    forkIO $ do
+        threadDelay 2000000
+        putStrLn "Hello World"
+
+    putStrLn "Game over!"
+

If you run this using runhaskell or by compiling and running the binary +you’ll only see the output of Game over!. The second thread will never print +Hello World, because by the time it starts waiting the main function will +return and the runtime will kill the other thread.

We can fix this by using an MVar to make the main function wait for the +other thread to finish.

import Control.Monad
+import Control.Concurrent
+
+main :: IO ()
+main = do
+    a <- newEmptyMVar
+
+    forkIO $ do
+        threadDelay 2000000
+        putStrLn "Hello World"
+        putMVar a ()
+
+    takeMVar a
+    putStrLn "Game over!"
+

The main thread first tried to take a value out of the MVar, which will block +because there’s nothing in there yet, and then the second thread will sleep for +2 seconds, print Hello World and put a () into the MVar. This causes +main to continue, print Game over! and exit the program. We could also do +this the other way around by using putMVar on a full MVar in order to +block, but the end result is the same.

main :: IO ()
+main = do
+    a <- newMVar ()
+
+    forkIO $ do
+        threadDelay 2000000
+        putStrLn "Hello World"
+        takeMVar a
+
+    putStrLn "Game over!"
+    putMVar a ()
+

There are many more things to cover with respect to MVar, but I’m not going +to go more in depth here, since there already are other great resources on the +topic.

Software Transactional Memory - STM

Last on our list is Software Transactional Memory. Much like we had IORef and +MVar, STM gives us TVar, which stands for transaction variable. The way +that STM works is that it builds up a log of actions that are to be performed +atomically. We won’t be covering STM itself as a method for managing +concurrency, since it’s a rather lengthy topic. Instead we’ll just examine the +options for achieving mutable state using STM using a TVar.

Every STM operation happens inside the STM monad, which already tells us that +we can chain multiple STM operations into one (since the monad instance +provides us with >>=.) In order to run the actual STM transaction we must +use the function atomically :: STM a -> IO a, which takes any STM operation +and performs it in a single atomic step.

The API for creating TVars is almost the same as for IORefs.

data TVar a
+
+newTVar    :: a -> STM (TVar a)
+readTVar   :: TVar a -> STM a
+writeTVar  :: TVar a -> a -> STM ()
+modifyTVar :: TVar a -> (a -> a) -> STM ()
+

There are also alternatives that work in the IO monad.

newTVarIO   :: a -> IO (TVar a)
+readTVarIO  :: TVar a -> IO a
+

Note that these are just convenience functions that we could have implemented +ourselves using atomically function.

newTVarIO :: a -> IO (TVar a)
+newTVarIO = atomically . newTVar
+
+readTVarIO :: TVar a -> IO a
+readTVarIO = atomically . readTVar
+

Now let’s move onto mutations. We’ll use the same example as we did with +IORef, but implement it using a TVar. We have many ways to approach it, +either by building one big transaction with all the steps, or by doing this in +many small ones.

First let’s do one big atomically with all the steps.

bigTransaction :: IO ()
+bigTransaction = do
+    value <- atomically $ do
+        var <- newTVar (0 :: Int)
+        modifyTVar var (+1)
+        readTVar var
+
+    print value
+

There’s not much interesting going on in here, so let’s split it into smaller +chunks. Even though modifyTVar is the perfect function for our use case, we +can use a combination or readTVar and writeTVar to achieve the same, +because atomically will make sure those two happen in a single step.

atomicReadWrite :: IO ()
+atomicReadWrite = do
+    var <- newTVarIO (0 :: Int)
+
+    atomically $ do
+        value <- readTVar var
+        writeTVar var (value + 1)
+
+    readTVarIO var >>= print
+

Since STM is a monad, we can also make this more interesting by combining two +STM operations together and running those atomically.

f :: TVar Int -> STM ()
+f var = modifyTVar var (+1)
+
+twoCombined :: IO ()
+twoCombined = do
+    var <- newTVarIO (0 :: Int)
+
+    atomically $ do
+        f var
+        f var
+
+    readTVarIO var >>= print
+

There’s a lot more to STM than just TVars which is why I’d encourage you, +dear reader, to take a look at the following resources. You might find that it +will change the way you think about concurrent programming completely.

Lens Tutorial - Introduction (part 1)

Mon, 14 Jul 2014 00:00:00 +0000

This article is the first in the upcoming series that aims to explain the +Haskell’s lens library and the +ideas behind it in an approachable way. Don’t worry if you’re new to Haskell, +the only prerequisites here should be understanding of the Functor type +class, and understanding how records and algebraic data types work in Haskell.

We won’t be using the lens library in this article yet. The API we’ll develop +will be exactly the same, but for the sake of learning I’ll try to show you how +everything works and why it works by re-implementing it from scratch.

Keep in mind that lenses are a very advanced topic in Haskell and it takes some +time to truly understand them. Don’t worry if you don’t understand everything +at first read.

The motivation behind lenses

If you’re coming from an imperative language like Ruby or Java, you’re probably +used to seeing code like this:

project.owner.name = "John"
+

The OOP people would call this a violation of the Law of Demeter, but let’s +ignore that it’s a bad practice for now. The question here is, can we achieve +something similar in Haskell?

data User = User { name :: String, age :: Int } deriving Show
+data Project = Project { owner :: User } deriving Show
+
+setOwnerName :: String -> Project -> Project
+setOwnerName newName p = p { owner = (owner p) { name = newName } }
+

Now we can already see how this is less than ideal. In order to change the name +of the owner, we need to re-assign the owner field in the Project with the +new User, which is updated using the record syntax. We could do this in +multiple steps as follows.

Code blocks with λ> denote GHCi session.

λ> let bob = User { name = "Bob", age = 30 }
+λ> let project = Project { owner = bob }
+
+λ> let alice = bob { name = "Alice" }
+λ> let project2 = project { owner = alice }
+

This is very tedious compared to the original Ruby example, especially since we +need to keep re-building the original structure as we go deeper and deeper.

A naive lens implementation

This is where lenses come to help you out. In essence, lenses are just getters +and setters which you can compose together. In a naive approach the type might +look something like the following:

data NaiveLens s a = NaiveLens
+                         { view :: s -> a
+                         , set  :: a -> s -> s }
+

Following the convention of the official lens library +I’ve named the type parameters s and a, where s is the object and a +is the focus. In our example above the s would be Project and a would +be a String, since we’re trying to change the name of the project’s user.

Now given a lens of type NaiveLens User String we can easily change the +name of a user

λ> let john = User { name = "John", age = 30 }
+λ> set nameLens "Bob" john
+User {name = "Bob", age = 30}
+

How is such lens implemented? It’s simply a getter and a setter.

nameLens :: NaiveLens User String
+nameLens = NaiveLens name (\a s -> s { name = a })
+

The problem with this approach of sticking a getter and a setter into a data +type is that it doesn’t scale very well. If we wanted to do something like +increment the value at the target by one, we would have to first view the +current value, apply +1 to it, and then set the new value. We could +encapsulate this by providing the lens with a third function call over:

over :: (a -> a) -> s -> s
+

We could use this similarly to set.

λ> let john = User { name = "John", age = 30 }
+λ> over ageLens (+1) john
+User {name = "John", age = 31}
+

ageLens :: NaiveLens User Int
+ageLens = NaiveLens age
+                     (\a s -> s { age = a })
+                     (\f s -> s { age = f (age s) })
+

The problem is that now we need to provide a getter and two setters for each +lens, even if we just use one.

If you’ve been using Haskell for a while you’ve probably seen the magical +function const. It’s actually not magical at all, it simply has a type of a -> b -> a, which allows us to turn over :: (a -> a) -> s -> s into set :: a -> s -> s by partially applying it, which leads to the definition of set as +follows.

set :: NaiveLens s a -> a -> s -> s
+set ln a s = over ln (const a) s
+

Here’s how the whole code looks now

data NaiveLens s a = NaiveLens
+                         { view :: s -> a
+                         , over :: (a -> a) -> s -> s }
+
+set :: NaiveLens s a -> a -> s -> s
+set ln a s = over ln (const a) s
+

Lenses with side effects and more

Now we can see that over is definitely useful, but what if our modifier +function needs to perform some side effects? For example we might want to send +the current value over the network to determine the new value. We could go on +as before and add yet another function called overIO, which would look as the +following:

overIO :: (a -> IO a) -> s -> IO s
+

But this means our simple pair of a getter and a setter has grown into a getter +and two setters again. Not to mention that we might want to use over in more +settings than just IO. Here’s how the type would look now.

data NaiveLens s a = NaiveLens
+                         { view   :: s -> a
+                         , over   :: (a -> a) -> s -> s
+                         , overIO :: (a -> IO a) -> s -> IO s }
+

This is the point where the magical generalization of what is called the van +Laarhoven lens comes into play. First step is that we can write our overIO +in a more general way by swapping IO for a Functor, which gives us the +following type.

overF :: Functor f => (a -> f a) -> s -> f s
+

For the sake of keeping this article short I’m going to tell you that overF +is everything we need in order to implement view, set, over and overIO. +Which means we no longer need a Lens record type, since we’ll have just one +function.

type Lens s a = Functor f => (a -> f a) -> s -> f s
+

By making this a type alias instead of a newtype or data we get one amazing +property of lenses. You can define your own lenses without depending on the +lens library. Any function which has the appropriate type signature is a +lens, there is no magic.

One thing to note here is that we do need to enable the +RankNTypes extension for +this type alias to compile. To do that simply add the following snippet to the +first line of your file.

{-# LANGUAGE RankNTypes #-}
+

or if you’re following along in GHCi type :set -XRankNTypes. I won’t be +explaining this in this article since it’s quite a complicated topic, but if +you’re interested in learning more, a simple google search will yield a lot of +good results.

Implementing `over`, `set` and `view` in terms of `Lens s a`

Let’s summarize before we move on. We started with an idea that a lens +represents a getter and a setter into some data type. Then we generalized the +setter to work with functions (using over). Last we realized that over is +not good enough when we want to do side effects, so we moved to overIO and +finally generalized it to the van Laarhoven lens of Functor f => (a -> f a) -> s -> f s.

So far I’ve only told you that our new Lens s a can behave like over, set +and view, but we need to prove it to really understand why. In order to do +this we’ll make use to two Functor instances that come from the base +library, namely Data.Functor.Identity and Control.Applicative.Const. Let’s +start with the simplest one, that is implementing over with the Identity +functor.

`over` with `Identity`

First of all, here’s the implementation of Identity.

newtype Identity a = Identity { runIdentity :: a }
+
+instance Functor Identity where
+  fmap f (Identity a) = Identity (f a)
+

The reason why this is useful is because we can put a value in, let it behave +as a functor, and then take the value out.

The final type of over that we’re looking for is over :: Lens s a -> (a -> a) -> s -> s. We can read that as: Given a lens focusing on an a inside of +an s, and a function from a to a, and an s, I can give you back a +modified s from applying the function to the focus point of the lens.

over :: Lens s a -> (a -> a) -> s -> s
+over ln f s = _
+

+
If you’re on GHC 7.8.x you can copy the exact snippet above and get an error +telling you what type is needed in place of _ (this functionality is provided +by so called type holes.) Also don’t forget that you need to add the type alias +for Lens s a and enable the RankNTypes extension as mentioned above.
+

We’ll inline the Lens type synonym, just so that we can see what is really +going on. Don’t worry if the type looks scary, it will all make sense in a +short while.

over :: (Functor f => (a -> f a) -> (s -> f s)) -> 
+        (a -> a) -> s -> s
+over ln f s = _
+

I’ve added a few parentheses, especially around the s -> f s, to make it +clear as we go along with partial applications. Keep in mind that Lens is +just a function, nothing more.

We only have one function of the type a -> f a available here to pass into +the lens ln, and that is Identity.

over :: Lens s a -> (a -> a) -> s -> s
+over ln f s = _ (ln Identity)
+

Using GHCi to play with types

If you want to play along in GHCi, there’s a neat little trick you can do to +interactively play with types. Say that you want to see the type of ln Identity

λ> let ln = undefined :: (Functor f => (a -> f a) -> (s -> f s))
+λ> :t ln Identity
+:: s -> Identity s
+

The reason why this works is because the undefined can take on any type. +Since we’re just trying to make the types align, you won’t get an error from +trying to evaluate the undefined, you’ll just a type error. This way you can +keep trying to partially apply things to see if the types match as you expect.

Anyway, moving on. We haven’t really used our function f yet, and there will +be no more a to apply it to ones we give something to the lens ln. This is +why we need to apply it before we stick in the Identity, or compose it with +the Identity to be specific.

over :: Lens s a -> (a -> a) -> s -> s
+over ln f s = _ (ln (Identity . f))
+

Now our current type hole if (s -> f s) -> s, which means we can stick in our +s. To make this syntactically more pleasing we’ll replace some parentheses +with $.

over :: Lens s a -> (a -> a) -> s -> s
+over ln f s = _ $ ln (Identity . f) s
+

Hang in, we’re almost done. The last thing we need do, as our type hole tells +us, is f s -> s, which means we basically need to rip off the functor. This +is easy to do as we’re using the Identity functor, so we just apply +runIdentity.

over :: Lens s a -> (a -> a) -> s -> s
+over ln f s = runIdentity $ ln (Identity . f) s
+

If you’re feeling adventurous, we can rewrite this using point free style.

over :: Lens s a -> (a -> a) -> s -> s
+over ln f = runIdentity . ln (Identity . f)
+

`view` with `Const`

Now let’s move on to view, where the type is simply view :: Lens s a -> s -> a. We can read this as: Given a lens that focuses on an a inside of an s, +and an s, I can give you an a.

This part is probably the most magical, since the type of the Lens s a is (a -> f a) -> s -> f s and we’re trying to implement something that’s s -> a, +which means we need to have a way to turn the final f s into an a. The key +to this is the Const functor.

newtype Const a b = Const { getConst :: a }
+
+instance Functor (Const a) where
+  fmap _ (Const a) = Const a
+

Let’s break this down into steps and first explain how Const works. Const +is a wrapper which takes a value, hides it deep inside, and then pretends to be +a functor containing something else, which is why it ignores the function +you’re trying to fmap over const. Here’s an example:

λ> :t Const "hello"
+:: Const String b
+

We’ve hidden a "hello" string inside a Const, now let’s try to apply a +boolean function to it using fmap.

λ> let boolBox = fmap (&& False) (Const "hello")
+λ> :t boolBox
+Const [Char] Bool
+

The Const has taken over to be a type of Const String Bool. If we fmap +over a function Bool -> Double we’ll get a Const String Double.

λ> :t fmap (\_ -> 1.2 :: Double) boolBox
+:: Const String Double
+

The important thing to keep in mind here is that the Const simply ignores the +function we’re fmapping and takes on the new type, while keeping our original +String safe. We can extract it back at any time we want, no matter how many +things we’ve fmapped.

λ> getConst boolBox
+"hello"
+λ> getConst $ fmap (\_ -> 1.2 :: Double) boolBox
+"hello"
+

The actual `view` implementation

Let’s do this using type holes again.

view :: Lens s a -> s -> a
+view ln s = _
+

We can approach this the same way as we did before when implementing over +using Identity. First of all, here’s the type of Lens s a again in case you +forgot Functor f => (a -> f a) -> s -> f s.

If you squint hard enough you can see that if we somehow pass a function to +ln, we’ll get back another function of the type s -> f s, which we can give +our s, and then the only thing remaining is to extract the resulting a out +of the f s. Again the only function that fits here is Const.

view :: Lens s a -> s -> a
+view ln s = _ $ ln Const
+

The type of the hole here is (s -> f s) -> a, which means we can apply our +s on the right side as we did with over.

view :: Lens s a -> s -> a
+view ln s = _ $ ln Const s
+

Now all we’re left with is f s -> a, and because we know that the f s is +actually Const a s we can get back the a using getConst

view :: Lens s a -> s -> a
+view ln s = getConst $ ln Const s
+

And there you go, we got ourselves a view. I won’t be showing how to +implement set step by step, since it can be trivially defined either in terms +of over, which is good enough for us.

set :: Lens s a -> a -> s -> s
+set ln x = over ln (const x)
+

Writing our own lenses

In order to use lenses we actually need to have some lenses. As said earlier, +we do not need the lens library to define a new lens, we only need a function +with the type of Functor f => (a -> f a) -> s -> f s. Let’s make one!

We’ll start by implementing the _1 lens, which focuses on a first element of +a pair. The type will be Lens (a,b) a or specifically Functor f => (a -> f a) -> (a,b) -> f (a,b), in another words Given a pair of (a,b) the lens +focuses on the first element of the pair, which is a.

_1 :: Functor f => (a -> f a) -> (a,b) -> f (a,b)
+_1 f (x,y) = _
+

An interesting thing about pure functions in Haskell is that more often than +not, there is only one way to implement a function so that it typechecks. We +can use the types as we did earlier to guide us while implementing this.

Ok let’s get going. We have three values available (via the function +parameters), f :: a -> f a, x :: a and y :: b. The only thing we can do +here is apply f to x.

_1 :: Functor f => (a -> f a) -> (a,b) -> f (a,b)
+_1 f (x,y) = f x
+

This will fail to typecheck, since we’re trying to return f a instead of f (a,b). What else can we do now? We know f is a Functor, which means we can +use fmap. We also know that we need to somehow use y to compose the result. +If you think about this for a while, all we can really do is fmap some +function on the result of f x

_1 :: Functor f => (a -> f a) -> (a,b) -> f (a,b)
+_1 f (x,y) = fmap _ (f x)
+

The result is that the type of _ in this case must be a -> (a, b). That’s +it, we only have one thing of type b, which is y, and the a we can take +just form the parameter passed to the lambda, hence giving us the following.

_1 :: Functor f => (a -> f a) -> (a,b) -> f (a,b)
+_1 f (x,y) = fmap (\a -> (a, y)) (f x)
+

Whoa, did we just write an actual lens? I believe we did sir. Let’s test things +out!

Using lenses

Now that we got ourselves a view and _1 lens, let’s play!

λ> view _1 (1,2)
+1
+

We can also use set and over to change the value

λ> set _1 3 (1,2)
+(3,2)
+λ> over _1 (+3) (1,2)
+(4,2)
+

Let’s see how to define a lens for the original User and Project types.

data User = User { name :: String, age :: Int } deriving Show
+data Project = Project { owner :: User } deriving Show
+

We’ll start with a lens for the User’s name, which simply has the type +Lens User String. There’s no magic here, we’ll just follow the same pattern +as we did with the _1 lens.

nameLens :: Lens User String
+nameLens f user = fmap (\newName -> user { name = newName }) (f (name user))
+

As you can see this is just mechanical work. We can define the other two lenses +for age and owner by simply copy pasting the first one and changing a few +things around.

ageLens :: Lens User Int
+ageLens f user = fmap (\newAge -> user { age = newAge }) (f (age user))
+
+ownerLens :: Lens Project User
+ownerLens f project = fmap (\newOwner -> project { owner = newOwner }) (f (owner project))
+

Composing lenses together

Because lenses are just functions (remember that Lens s a is just a type +alias) we can compose them using the ordinary function composition .

ownerNameLens :: Lens Project String
+ownerNameLens = ownerLens.nameLens
+

Let’s test this out:

λ> let john = User { name = "John", age = 30 }
+λ> let p = Project { owner = john }
+λ> view ownerNameLens p
+"John"
+λ> set ownerNameLens "Bob" p
+Project {owner = User {name = "Bob", age = 30}}
+

Conclusion of part 1

Congratulations to you if you’ve read this far, you now have a good +understanding of how the basic Lens s a works. This is not the end though, +since lenses are a very large subject and there is a lot of ground to cover. +The followup posts to this one will cover the more general Lens s t a b type, +folds, traversals, prisms, isos, using template haskell to generate lenses, and +much more!

If you’re curious especially about the Lens s t a b type and what it means, +it’s basically just a small generalization of what we’ve devleoped here. +Compare the following two:

type Lens' s a = Functor f => (a -> f a) -> s -> f s
+type Lens s t a b = Functor f => (a -> f b) -> s -> f t
+

This might look weird at first, but it’s not if you apply it to a specific data +type, such as:

Lens (Int, String) (Double, String) Int Double
+(Int -> f Double) -> (Int, String) -> f (Double, String)
+

It simply allows you to change the type of the underlying structure, but as I +said earlier, we’ll cover this more in one of the upcoming blog posts.

Using Phantom Types in Haskell for Extra Safety - Part 2

Thu, 10 Jul 2014 00:00:00 +0000

I’ve received a lot of reactions to the previous blog post about Phantom +Types +over the past two days, which is why I’ve decided to summarize what I’ve +learned in another blog post.

First, here’s a summarized problem from the previous post. We have a Message +which can be either PlainText or Encrypted. We’ve used Phantom Types to +enforce this in the type system:

data Message a = Message String
+
+data PlainText
+data Encrypted
+
+send :: Message Encrypted -> IO ()
+encrypt :: Message PlainText -> Message Encrypted
+decrypt :: Message Encrypted -> Message PlainText
+

Can newtype do the same?

Many people mentioned that we could use the Haskell’s newtype to do the same, +here’s how that would look.

data Message = Message String
+newtype PlainTextMessage = PlainTextMessage Message
+newtype EncryptedMessage = EncryptedMessage Message
+
+send :: EncryptedMessage -> IO ()
+encrypt :: PlainTextMessage -> EncryptedMessage
+decrypt :: EncryptedMessage -> PlainTextMessage
+

This example would work perfectly fine, and it’s how you’d probably solve this +in a statically typed language with no option for representing Phantom Types.

But there’s one downside to this solution. Our new PlainTextMessage and +EncryptedMessage are no longer related, which means we can’t write a function +that operates on both of them. Why would we need that? I’m glad you asked! +Here’s how a simple length function would look in Haskell.

length :: [a] -> Int
+length [] = 0
+length (x:xs) = 1 + length xs
+

In order to calculate the length of a list, we do not care what is in the list. +The same way if we wanted to calculate a messageLength, we don’t care if the +message has been encrypted or not, we just want to count the characters. This +is dead simple if we had Phantom Types, but it would be very hard using the +newtype solution, since PlainTextMessage and EncryptedMessage are +parametrically (is that even a word?) not the same thing.

messageLength :: Message a -> Int
+messageLength (Message m) = length m
+

As you can see, we simply ignore the type parameter a of the Message type +and calculate the length of the inner String.

We could achieve the same in the newtype solution using type classes, but it +would be unnecessarily more complicated. Phantom types just fit this solution +more naturally.

GADTs

Some people have noted that we could achieve the same thing using GADTs +(Generalised Algebraic Data Types), which is an extension to the Haskell’s type +system. I didn’t want to dive into this at first, since GADTs are much harder +to understand for non-Haskell programmers, but let’s show a simple +implementation of this example.

data Encrypted
+data PlainText
+
+data Message a where
+  EncryptedMessage :: String -> Message Encrypted
+  PlainTextMessage :: String -> Message PlainText
+

The difference here is that we’re basically creating typed value constructors +which automatically enforce the resulting type of the Message. For example if +we do EncryptedMessage "hello", it will automatically have the type of +Message Encrypted. This might seem the same as the newtype solution +mentioned above, but by using GADTs we can still write a generic +messageLength function, exactly as we did previously.

messageLength :: Message a -> Int
+messageLength (EncryptedMessage m) = length m
+messageLength (PlainTextMessage m) = length m
+

The difference here is, that we need to pattern match on both of the +constructors. An implementation fo the send function might look something +like this.

send :: Message Encrypted -> IO ()
+send (EncryptedMessage m) = -- some magic
+

If you’re familiar a bit with Haskell, you might be thinking that this function +is not total and could produce a non-exhaustive pattern match error. But in +fact it can’t, because it expects it’s argument of the type Message Encrypted. If you try to call it with a PlainText message it would be a type +error.

send (PlainTextMessage "hello") -- type error
+

This is one of the beauties of GADTs. If you’re interested in learning more +about them, I recommend reading the Haskell Wiki +page as +well as many +others. I’ll probably write +another followup article that explains just GADTs, just because they’re such +a rich feature.

Tell don’t ask™

Patrick Dlogan actually took the time to write an article as a reaction to +mine, +where he shows a solution in which messages know how to encrypt themselves, +which allows you to get rid of the if check in a dynamic language. Here’s +also a similar response from comments on +Lobste.rs.

Message = Struct.new(:text) do
+  def ciphertext
+    @ciphertext ||= # encrypt plain text logic
+  end
+end
+
+def send_message(message)
+  # send using message.ciphertext
+end
+

We could label both of these solutions as a kind of tell don’t ask™ +principle. Basically what it means is that instead of performing the encryption +first, and then sending the message out, the encryption step is being run +directly when sending the message.

Here’s how something similar might look in Haskell. We’re simply doing the +encryption when sending the message.

send :: Message -> IO ()
+send (Message m) = someMagic (encrypt m)
+

Now this might make sense in some cases, but what if there is more than one +place where a message can get encrypted? We could solve that by making +encrypt do nothing for already encrypted messages, but there are downsides to +doing that.

First of all it’s important to realize that this is restructuring how the +program works. If encrypt is something that can fail we’ve effectively moved +that failure to a different place. If encrypt was throwing an exception that +had to be handled, now that error handling needs to happen in the place of the +caller of send (assuming it’s not something we can deal right in place.)

Another more important reason why this wouldn’t always be possible is that the +code for constructing messages might be outside of our control. Say that all of +the logic is hidden in a library which you can’t change for various reason, or +these are just some data types you’re receiving from an API.

The library could still make use of Phantom Types to safely tag the values on +the type level, while you wouldn’t be able to apply this tell don’t ask +approach, since the encrypt logic is not in your control.

I guess the TL;DR here is that by using the type system in a smart way we can +add additional checks that are verified at compile time, that increase the +safety of our programs. It’s not a technique for re-structuring or re-designing +a portion of the codebase.

Using Phantom Types for Extra Safety

Tue, 08 Jul 2014 00:00:00 +0000

If you’ve been programming in a dynamic language, you’ve probably heard that +type systems can catch more errors before your application even gets run. The +more powerful the type system is, the more you can express in it. And because +we’re talking about Haskell, we have a great number of tools at our disposal +when trying to express things in terms of the types.

Why is this important? Sometimes a function has an expectation about the value +that it’s receiving. In most imperative languages those expectations are +implicit and up to the programmer to hold, such as the following

def foo(bar)
+  bar.baz
+end
+

In this example the function foo implicitly expects an object which is not +nil. If you call foo(nil), you’ll get an exception at runtime. To combat +this we usually write unit tests to verify that our system will never get into +such state that the function would get passed in a nil. Now this is a very +simple example, let’s take a look at a more complicated one.

Imagine you’re writing a service which receives messages from users, encrypts +them, and sends them on through an unsecured channel. The messages are both +being sent and received as base64 encoded strings, so you can’t easily tell if +a message has been encrypted by just inspecting it.

Here’s how we could represent the message in Haskell and in Ruby, just so that +we can compare the code.

data Message = Message String
+

class Message
+  attr_accessor :text
+
+  def initialize(text)
+    @text = text
+  end
+end
+

Now this is all well and good, but we also want to keep track if the message +has been encrypted or if it is still in plain text. To do this in Haskell we’ll +use a simple Algebraic Data Type, while in Ruby we’ll add an additional +attribute called encrypted, which will default to false.

data Message = PlainText String | Encrypted String
+

class Message
+  attr_accessor :text, :encrypted
+
+  def initialize(text)
+    @text = text
+    @encrypted = false
+  end
+end
+

While the Haskell version is less verbose, it doesn’t give us much more safety +guarantees at this point. Let’s say we want to define a function which sends a +message. We want it only to accept a message that has been encrypted, since +sending a plain text message is unsafe and should not be allowed.

def send_message(message, recipient)
+  if message.encrypted
+    # send logic
+  else
+    raise ArgumentError, "Can’t send a plain text message"
+  end
+end
+

send :: Message -> Recipient -> IO ()
+send (Encrypted m) recipient = some magic with m
+send (PlainText _) _ = undefined
+

It doesn’t really matter how we chose to represent this in Haskell. Even if we +used a Maybe or Either to handle the failure, we would still have to handle +this at runtime. Which means only one thing, this function needs to be for the +edge case that we pass in a message in an invalid state, and we would also need +to test the error handling. This is as far as we can go with Ruby, since +there’s no way to enforce more structure into the program.

But wouldn’t it be much nicer if a program that’s trying to call send with +PlainText message would get rejected by the type checker? Such program is not +valid in our business domain and it shouldn’t compile. If we manage to do that, +we can save ourselves the error handling, and also writing tests for the error +handling.

To be able to do this we need to express the relationship between the +Encrypted message and the send function at the type level. The trick that +allows us to do this is called Phantom Types, but to understand those, first +let’s take a look at simple parametric data types in Haskell. They are very +similar to templates or generics in C++/C#/Java and many other languages. +Here’s a simple parametric type:

data Maybe a = Just a | Nothing
+

The a on the left side is simply a type parameter. If we choose to create a +value such as Just 3, it would have the type of Maybe Int.

Phantom Types

A type is called a Phantom Type if it has a type parameter which only appears +on the left hand side, but is not used by any of the value constructors. Here’s +how we could need to modify our Message type to make it into a Phantom Type.

data Message a = Message String
+

This allows us to have things like Message Int, Message String, Message (Maybe Char), and so on. In itself it might not look appealing, since no +matter what type we use it will still have a single value constructor which +works with Strings. But let’s expand this further by adding two empty data +types, one for each type of the message.

data Encrypted
+data PlainText
+

This gives us an option to create both Message Encrypted and Message PlainText types. Remember that even if we’re not using the type parameter in +any of the constructors, it is still verified by the type system, which means +we can change our send function to have the following signature.

send :: Message Encrypted -> Recipient -> IO ()
+encrypt :: Message PlainText -> Message Encrypted
+decrypt :: Message Encrypted -> Message PlainText
+

The last thing we would need to do to make this completely safe is to make the +constructor for Message private and only export a function for creating a new +instance of the type. This makes it impossible to change the state of the +Message type in any other way, but by using our encrypt and decrypt +functions, because you wouldn’t be able to use pattern matching to extract the +inner value. The function for creating a new Message could look something +like this

newMessage :: String -> Message PlainText
+newMessage s = Message s
+

Now armed with the power of Phantom Types, the following would be rejected by +the type system, making it impossible to send plain-text messages.

send (newMessage "hello!") "john@example.com"
+

A similar thing could also be implemented using Generalised Algebraic Data +Types +(GADTs), but that’s in the scope of this article. If you’re interested in +learning more, I recommend checking out the Haskell Wiki article about Phantom +Types, which has some great +examples, or the WikiBooks +entry.

Update: As it was just pointed out in the comments on +Lobste.rs, +it’s worth noting that all of this safety guarantee comes for free. The +types are stripped when the program type checks and compiles, so there +is no runtime overhead. This might be something not so obvious to people +used to programming in dynamic languages.

Evil Mode: How I Switched From VIM to Emacs

Mon, 23 Jun 2014 00:00:00 +0000

I’ve been a long time VIM user. I use it every day for all +of my work and side projects, writing blog posts, writing other +content, sometimes even for writing emails if the text is long +enough. VIM is like my home and I’m deeply in love with it.

The problem is that VIM is a horrible IDE. It’s an amazing and super +productive editor, but it really sucks at doing IDE-like things. Now +you might be thinking I’m a noob who needs to click on good looking +buttons in RubyMine to get things done. No, that’s not what I mean by +IDE … let me explain.

Most of my work in the past years has been either Ruby or +JavaScript. Those are dynamic languages with close to none IDE +support. You don’t usually run a REPL and eval your Rails app, but +instead write a test, write some code, hit a button to run the test, +and occasionally reload the browser.

Most of the work is heavy editing of large amounts of source code, +which is what VIM excels at. Running the tests is easy as well, since +you can just bind it to a key with a single line of vimscript

nnoremap <leader>t :!rspec %<cr>
+

This is all nice, but what if you want to use a language which has a +REPL? What if you want to display errors inline every time you save a file?

I can actually answer both of these right now. There were numerous +attempts to bring something REPL-like to VIM, but usually the outcome +is unusable. Most people I know don’t even bother with this and use +tmux. Which is fine if this is the only problem you’re trying to +solve, but there’s more. Let’s go to problem #2.

If you’ve been using VIM for a while, you probably immediately thought +Doesn’t syntastic already display errors inline?. Yes it does, but +it’s doing so in a synchronous way. Which means if you’re using a +checker which is slower than half a second, you will have a bad +time. This is especially true for ghc-mod, which takes up to 5 +seconds on my machine. There are alternatives that make this faster, +but this is still talking about a fast dev machine. When I’m working +on my tiny 11" Lenovo, it’s just impossible to have Syntastic turned +on.

Evil Mode

I’ve been using Emacs on and off for about 2-3 years now. But there +was always the feeling of being slow compared to VIM. Especially once +you get really fast at navigating in VIM, it’s hard to use anything +else.

Then one night I decided to give Evil Mode. Every other editor +supports some sort of VIM emulation, but every single one of them I +tried fell short. But not Emacs. I am completely blown away by the +level of integration Evil Mode has. There are even ports of the most +popular VIM plugins into Evil Mode, such as Tim Pope’s vim-surround.

Almost everything works out of the box, and even more, it works really +well with the built-in Emacs key-bindings. You can search, record +macros, jump around like you would in VIM, but you get the full power +of Emacs at your hand as well. This means you can be in the middle of +typing something and immediately press C-c C-l while still in insert +mode to load your file into the REPL. (Yes you still have to save the +file, but that can be done while in insert mode as well.)

There are things that I love about Emacs that I was missing in VIM, +and there are things about VIM that I was missing in Emacs, but Evil +Mode does such a great job at bringing the two together. It’s hard to +describe this feeling in words. I can only imagine that the developers +behind it are very skilled VIM users.

Disadvantages

While most things work really nice with the default Emacs keymap, +there is one VIM feature that I had to disable, and that is q. The +reason for this is that q is being used at many places in Emacs to +close things, and sometimes it so happens that Evil Mode is turned on +in that window at the same time, which results in recording a macro +instead of closing the window.

Here’s my complete list of customizations.

(define-key evil-normal-state-map (kbd ",f") 'projectile-find-file)
+(define-key evil-normal-state-map (kbd ",,") 'evil-buffer)
+(define-key evil-normal-state-map (kbd "q") nil)
+
+(define-key evil-insert-state-map (kbd "C-e") nil)
+(define-key evil-insert-state-map (kbd "C-d") nil)
+(define-key evil-insert-state-map (kbd "C-k") nil)
+(define-key evil-insert-state-map (kbd "C-g") 'evil-normal-state)
+(define-key evil-visual-state-map (kbd "C-c") 'evil-normal-state)
+
+(define-key evil-motion-state-map (kbd "C-e") nil)
+(define-key evil-visual-state-map (kbd "C-c") 'evil-exit-visual-state)
+

Most of these are just minor inconveniences, though I really do +appreciate the level of customization available in Evil Mode. If +anything misbehaves, it’s easy to just C-h k and press the key, to +see which function gets invoked. After that it’s just a matter of +looking at the source code, or simply overriding the key in the +specific mode.

Customizability (is that even a word?) is one of the reasons why I’m +starting to like Emacs more and more. Even though I’m not a big Lisp +fan, I still prefer it to vimscript any day of the week. Most of the +Elisp code out there is very readable and well commented, so it’s not +that hard to dig into the source of the package you’re using and try +to figure some things out. Having more customization options is also a +disadvantage to some extent, since Emacs packages are usually more +complex than the VIM counterparts and it takes more time to setup things.

I don’t think I’ll ever give up VIM entirely, mostly because I’ve +invested many years into perfecting my Ruby workflow. That +being said, I don’t mind switching to Emacs for other languages, where +the support is much better to begin with, and the price to pay in +terms of differences between evil mode and VIM is quite small. Maybe +one day I will be able to write Ruby in Emacs as well.

Yesod is Fun

Thu, 15 May 2014 00:00:00 +0000

I’ve been trying many Haskell web frameworks over the past few weeks. I wrote +one small app with Simple, almost wrote another one with +Scotty. Then decided it’s time to take a +look at the big guys, Happstack, +Snap and Yesod.

First I tried Happstack, which felt kind of OK and very understandable, mostly +because it doesn’t seem to be trying to do much magic. This is really great for +learning, but then I stumbled when I found that it’s not actually being +developed on GitHub. I know this in itself isn’t an argument against Happstack, +but given that it seems to be the least popular and least used out of the +three, it definitely doesn’t help it get higher on my list.

Next goes Snap, which I have really mixed feelings about. At first Snap +feels very simple and well documented, and even has a +book, which I immediately bought. There +is even an IRC room with more than 20 people in it. I was so excited. +But then small things started to pile up and I became less and less +excited.

While the documentation seems to be sufficient, I couldn’t really found +the answers to many of my questions, the IRC room while full of people +is very idle and the GitHub repo seems quite dead. I will definitely +give Snap another try in the next weeks.

And then came Yesod. I’ve been avoiding Yesod for quite some time, +mostly because I assumed it’s a big ball of magic, as Rails is, and I +wanted to avoid that in the beginning. I also tried it about a year ago +and failed, but this time I decided to really dig in and write something +in Yesod.

I haven’t really made much progress yet, but there’s an interesting +factor that was missing from the other frameworks (apart from Simple), +and that is fun. Yesod is fun.

Everything worked out of the box and the way I expected. Even the +automatic migrations which I didn’t like at first surprised me by doing +the right thing every time I used it. While there is a lot of Template +Haskell being used it actually does make a lot of sense after studying +it for a while. It might make some things a bit more obscure, but I find +it being used very reasonably and in ways which make sense to me.

Also every time I check out any of the repos for +Yesod there has been a new commit, usually +a few hours ago. Comparing this to Snap which is fairly +inactive +I would say and I have yet another bonus point for Yesod. The #yesod +channel on IRC is also really active, and the documentation is +outstanding. There’s also FP Complete, which is another huge bonus point +for Yesod.

All of this put together and I have a clear winner, at least for now.

Duplication in Tests Is Sometimes Good

Sun, 04 May 2014 00:00:00 +0000

Having powerful tools like RSpec gives us so much power, that what was +once a nice suite of readable specs becomes a huge bunch of unreadable +mess, just because someone tried to DRY it up.

When writing your production code, there’s a good reason to keep the +code DRY. Most of the times having duplication in your code can be a +smell. But just because something sometimes smells, it doesn’t mean you +should try to fix it all the time. This becomes even more important when +writing tests.

Let’s compare these two examples

specify :draft? do
+  build(:post, status: Post::DRAFT).should be_draft
+  build(:post, status: Post::PUBLISHED).should_not be_draft
+end
+
+specify :published? do
+  build(:post, status: Post::DRAFT).should_not be_published
+  build(:post, status: Post::PUBLISHED).should be_published
+end
+

This looks ok, but could we maybe refactor it a little bit to avoid the +duplication there?

let(:draft_post) { build(:post, status: Post::DRAFT) }
+let(:published_post) { build(:post, status: Post::PUBLISHED) }
+
+specify :draft? do
+  draft_post.should be_draft
+  published_post.should_not be_draft
+end
+
+specify :published? do
+  draft_post.should_not be_published
+  published_post.should be_published
+end
+

Now that we don’t have that ugly duplication anymore, let’s ask +ourselves if the refactored test is really better? There is less +duplication, but we’ve split each test in two parts. The problem comes +when one of these tests fails, and suddenly you need to look around in +the whole file to see where the setup is being performed.

This becomes even worse if you use more than one let to setup your +specs, such as

let(:user1) { create(:user) }
+let(:user2) { create(:user) }
+let(:post) { create(:post, user: user1) }
+let(:admin) { create(:user, :admin) }
+
+it "doesn't allow any other user to delete a post" do
+  user2.can_delete?(post).should be_false
+end
+
+it "allows admins to delete any post" do
+  admin.can_delete?(post).should be_true
+end
+

Imagine you have 20 tests like this for each context, and then define +some other variables in the context above. A single failure will force +you to scroll up and down and look around in 500 lines of test code, +instead of just seeing everything in one place, such as.

it "doesn't allow any other user to delete a post" do
+  user1 = create(:user)
+  user2 = create(:user)
+  post = create(:post, user: user1)
+
+  user2.can_delete?(post).should be_false
+end
+
+it "allows admin to delete any post" do
+  post = create(:post)
+  admin = create(:admin)
+
+  admin.can_delete?(post).should be_true
+end
+

Is there more duplication? Yes. But if test #2 fails tomorrow, you’ll +see what exactly is being tested, instead of having to spend 5 minutes +goofing around in the spec file to see what is actually going on.

Light Table Plugin Tutorial

Mon, 13 Jan 2014 00:00:00 +0000

I’ve been playing around with Light Table since the day its source code was released (even made a tiny Ruby plugin).

First of all, Light Table is based on the BOT architecture. Which means there are three core concepts: behaviors, objects and tags. If you have any experience with Node.js or event driven programming, you’ll have an easy time understanding the concepts.

Imagine you have a button which listens on a click event and displays a notice to the user when it’s clicked

Using jQuery that could be as simple as the following

<input class="my-button" type="submit" value="Do work"/>
+

$(".my-button").click(function() {
+  showProgress("I'm doing some heavy lifting");
+});
+

But there are problems with this approach, especially from the Light Table’s point of view. First of all there’s no way to see the callback after it’s been attached to the element. Which means you also can’t change it easily at runtime. BOT allows us to decouple the object (the button) from the actual behavior it triggers (click).

Here’s an implementation in ClojureScript. If you want to follow along with the tutorial, create a new file, for example /tmp/tutorial.cljs, press Ctrl-Space, type Add Connection and select Light Table UI. This will allow you to evaluate the ClojureScript directly into the running Light Table instance. But before continuing, add the following requires at the top of your file.

(ns lt.tutorial
+  (:require [lt.object :as object]
+            [lt.objs.tabs :as tabs]
+            [lt.objs.statusbar :as statusbar]
+            [lt.objs.notifos :as notifos]
+            [lt.util.js :as util])
+  (:require-macros [lt.macros :refer [behavior defui]]))
+

From now on you should just be able to evaluate the current form under the cursor with Cmd-Enter.

Next we need to define our button, using the defui macro

(defui work-button [this]
+  [:input {:type "submit" :value "Do work"}]
+  :click #(object/raise this :clicked %))
+

This bit of code is fairly obvious, it results in a <input type="submit" value="Do work"/> with a click handler bound to our callback. #(object/raise this :click %) is just a shorthand for (fn [e] (object/raise this :click e)), where object/raise raises an event on the target object, in this case a click event. It has nothing to do with exceptions, despite its name.

Next we need to define our worker object.

(object/object* ::worker
+                :name "A hard worker"
+                :behaviors [::work-on-click]
+                :init (fn [this] (work-button this)))
+

It’s a hard worker who works when you click on it. Also note that the value returned from the :init function is used when the object is placed inside a tab, in this case it returns our button, bound to this object.

The behavior we’re after will use the beautiful notifos library from Light Table, which displays these wonderful moving-squares-in-a-circle progress indicators.

(behavior ::work-on-click
+          :triggers #{:clicked}
+          :reaction (fn [this] 
+                      (notifos/working "Doing some heavy lifting!")
+                      (util/wait 10000 #(statusbar/loader-set 0))))
+

The behavior name has to be the same as in the object’s :behaviors list. It has a set of triggers which trigger the :reaction function with the object passed in as an argument. In our case we’ll just display a working indicator and then hide it after 10 seconds.

Now we’re ready to create the object and add it as a tab.

(let [worker (object/create ::worker)]
+  (tabs/add! worker)
+  (tabs/active! worker))
+

A new tab should appear with a button filling its content. When you click the button you should see a small progress bar at the bottom of the page, which will automatically disappear after 10 seconds.

Now you might’ve noticed that the tab can’t be closed. This is because there is no default behavior for closing a tab, since some tabs might want to prompt the user to save a file, others might have a completely different implementation. The good thing is that we can add this easily without having to restart Light Table.

We’ll add another behavior which responds to the :close event (taken from docs.cljs)

(behavior ::on-close-destroy
+          :triggers #{:close}
+          :reaction (fn [this]
+                      (when-let [ts (:lt.objs.tabs/tabset @this)]
+                        (when (= (count (:objs @ts)) 1)
+                          (tabs/rem-tabset ts)))
+                      (object/raise this :destroy)))
+

Next we need to tell our object to use this behavior by simply adding it to the behaviors list.

(object/object* ::worker
+                :name "A hard worker"
+                :behaviors [::work-on-click ::on-close-destroy]
+                :init (fn [this] (work-button this)))
+

You don’t need to restart anything, just eval the behavior and the object definiton and you should be able to close the tab :) That’s how dynamic Light Table is.

For those who want to see the entire result, here’s a link to a gist and also a gif screencast of the whole process :)

This tutorial is really just a small introduction to what Light Table can do, but it should give you a little bit of insight into how dynamic the whole system actually is.

Discuss this post on Hacker News and Reddit

PostgreSQL Basics by Example

Mon, 19 Aug 2013 01:19:00 +0000

Connecting to a database

$ psql postgres     # the default database
+$ psql database_name
+

Connecting as a specific user

$ psql postgres john
+$ psql -U john postgres
+

+ +

Connecting to a host/port (by default psql uses a unix socket)

$ psql -h localhost -p 5432 postgres
+

You can also explicitly specify if you want to enter a password -W or not -w

$ psql -w postgres
+$ psql -W postgres
+Password:
+

Once you’re inside psql you can control the database. Here’s a couple of handy commands

postgres=# \h                 # help on SQL commands
+postgres=# \?                 # help on psql commands, such as \? and \h
+postgres=# \l                 # list databases
+postgres=# \c database_name   # connect to a database
+postgres=# \d                 # list of tables
+postgres=# \d table_name      # schema of a given table
+postgres=# \du                # list roles
+postgres=# \e                 # edit in $EDITOR
+

At this point you can just type SQL statements and they’ll be executed on the database you’re currently +connected to.

User Management

Once your application goes into production, or basically anywhere outside of your dev machine, +you’re going to want to create some users and restrict access.

We have two options for creating users, either from the shell via createuser or via SQL CREATE ROLE

$ createuser john
+postgres=# CREATE ROLE john;
+

One thing to note here is that by default users created with CREATE ROLE can’t log in. To allow login you need to provide +the LOGIN attribute

postgres=# CREATE ROLE john LOGIN;
+postgres=# CREATE ROLE john WITH LOGIN; # the same as above
+postgres=# CREATE USER john;            # alternative to CREATE ROLE which adds the LOGIN attribute
+

You can also add the LOGIN attribute with ALTER ROLE

postgres=# ALTER ROLE john LOGIN;
+postgres=# ALTER ROLE john NOLOGIN;   # remove login
+

You can also specify multiple attributes when using CREATE ROLE or ALTER ROLE, but bare in mind that ALTER ROLE doesn’t change the permissions the role already has which you don’t specify.

postgres=# CREATE ROLE deploy SUPERUSER LOGIN;
+CREATE ROLE
+postgres=# ALTER ROLE deploy NOSUPERUSER CREATEDB;  # the LOGIN privilege is not touched here
+ALTER ROLE
+postgres=# \du deploy
+           List of roles
+ Role name | Attributes | Member of
+-----------+------------+-----------
+ deploy    | Create DB  | {}
+

There’s an alternative to CREATE ROLE john WITH LOGIN, and that’s CREATE USER which automatically creates the LOGIN permission. It is important to understand that users and roles are the same thing. In fact there’s no such thing as a user in PostgreSQL, only a role with LOGIN permission

postgres=# CREATE USER john;
+CREATE ROLE
+postgres=# CREATE ROLE kate;
+CREATE ROLE
+postgres=# \du
+                             List of roles
+ Role name |                   Attributes                   | Member of
+-----------+------------------------------------------------+-----------
+ darth     | Superuser, Create role, Create DB, Replication | {}
+ john      |                                                | {}
+ kate      | Cannot login                                   | {}
+

You can also create groups via CREATE GROUP (which is now aliased to CREATE ROLE), and then grant or revoke +access to other roles.

postgres=# CREATE GROUP admin LOGIN;
+CREATE ROLE
+postgres=# GRANT admin TO john;
+GRANT ROLE
+postgres=# \du
+                             List of roles
+ Role name |                   Attributes                   | Member of
+-----------+------------------------------------------------+-----------
+ admin     |                                                | {}
+ darth     | Superuser, Create role, Create DB, Replication | {}
+ john      |                                                | {admin}
+ kate      | Cannot login                                   | {}
+postgres=# REVOKE admin FROM john;
+REVOKE ROLE
+postgres=# \du
+                             List of roles
+ Role name |                   Attributes                   | Member of
+-----------+------------------------------------------------+-----------
+ admin     |                                                | {}
+ darth     | Superuser, Create role, Create DB, Replication | {}
+ john      |                                                | {}
+ kate      | Cannot login                                   | {}
+

Ember.js: Testing Ember.js - part 1

Tue, 19 Feb 2013 19:05:00 +0000

Ever since I saw the testing slides from EmberCamp I was thinking +about testing. Up until now I’ve been using Capybara which is really +really really slow.

But @joliss mentioned this thing called Ember.testing which should +automagically fix all of the async problems which make tests ugly, such +as waiting for the application to initialize and finish routing.

In its essence Ember.testing = true disables the automatic runloop, +which gives you the control to manually schedule asynchronous operations +to happen in a one-off runloop via Ember.run.

Ember.run will run the given function inside a runloop and flush all +of the bindings before it finishes, which means you can render a view +inside Ember.run and check the DOM right after that. Here’s an example +from the Ember.View tests

+ + +

view = Ember.ContainerView.create({
+  childViews: ["child"],
+
+  child: Ember.View.create({
+    tagName: 'aside'
+  })
+});
+
+Ember.run(function(){
+  view.createElement();
+});
+
+equal(view.$('aside').length, 1);
+

As you can see the view.createElement() happens inside the runloop +scheduled by Ember.run which will return only after the view was +completely rendered and all bindings flushed.

Let’s take a look at a complete example +and take it apart step by step

// Testing mode disables automatic runloop
+Ember.testing = true;
+
+// Creating an application normally happens async,
+// which is why we have to wrap it in Ember.run
+Ember.run(function() {
+  App = Ember.Application.create();
+});
+
+App.Router.map(function() {
+  this.route("home", { path: "/" });
+});
+
+App.Store = DS.Store.extend({
+  revision: 11,
+  adapter: DS.FixtureAdapter.extend({
+    // This will make the FixtureAdapter do everything synchronously
+    // instead of using setTimeout, which is vital because setTimeout
+    // happens outside of the runloop.
+    simulateRemoteResponse: false
+  })
+});
+
+App.User = DS.Model.extend({ name: DS.attr("string")});
+App.User.FIXTURES = [ { id: 1, name: "brohuda" }];
+
+App.HomeRoute = Ember.Route.extend({
+  model: function() {
+    return App.User.find(1);
+  }
+});
+
+// Enabling Ember.testing will also disable automatic initialization,
+// which forces us to initialize manually
+Ember.run(function() {
+  App.initialize();
+});
+
+// In real life this would be an assertion,
+// here we'll just check if everything is rendered at this point in time.
+$("p strong").append($("h2").text());
+

Take the example apart, play with it and try to figure out what works +and what doesn’t :)

If you see

assertion failed: You have turned on testing mode, which disabled the run-loop's autorun.
+You will need to wrap any code with asynchronous side-effects in an Ember.run
+

it means that you forgot to wrap something in Ember.run. I hope this +is a good enough introduction. In one of the upcoming articles we’ll +take a look at simple Ember application and try testing it with a +full featured testing framework.

Ember.js: render, control, partial, view, template

Sun, 10 Feb 2013 21:29:00 +0000

There are many ways one can DRY up templates when using Ember.js, it all +depends on what you’re trying to achieve.

+ + +

partial && template

{% raw %}{{partial "foo"}}{% endraw %} will take a template +foo.handlebars and insert it without changing anything, which is +exactly the same as in Rails. There are no views created, no scope +changes, it just inserts the template right there.

{% raw %}{{template}}{% endraw %} isn’t really meant to be used anymore, so use +{% raw %}{{partial}}{% endraw %} instead.

view

{% raw %}{{view App.FooView}}{% endraw %} will create an instance of +App.FooView (with foo.handlebars template unless you override the +name) and insert it in place. You can bind on properties of the view, +such as {% raw %}{{view App.FooView contentBinding="foobar"}}{% endraw %}, +or just specify a property directly {% raw %}{{view App.FooView class="foobar"}}{% endraw %}.

This is a low level thing and is mostly used to instantiate simple +views, such as {% raw %}{{view Ember.TextField valueBinding="name" class="username"}}{% endraw %}

render && control

Most of the time you’re looking to use {% raw %}{{render}}{% endraw %} instead of +{% raw %}{{view}}{% endraw %} as it offers better means of +abstraction. {% raw %}{{render "foo" bar}}{% endraw %} will create a +App.FooController and bind it’s content to bar. It also creates a +App.FooView and renders a foo template.

One drawback is that {% raw %}{{render}}{% endraw %} can not be called multiple times on +a single route. If you need a self sustainable widget which can be +created any number of times you want, you’re looking for {% raw %}{{control}}{% endraw %} +which has exactly the same effect as {% raw %}{{render}}{% endraw %}, but it will have a new +controller instance every time you call it, while {% raw %}{{render}w{% endraw %} uses a +singleton controller.

Please keep in mind that {% raw %}{{control}}{% endraw %} is currently under heavy +development and will probably change soon, because of the high number of +issues there are with it.

Ember.js: Router Request Lifecycle

Fri, 08 Feb 2013 16:59:00 +0000

Router is the core part of Ember. Every time we go to a new URL it means +the route object is called with our params and stuff. These are the +hooks sorted in order in which they are called

+ + +

enter (private)
activate - executed when entering the route
deserialize (private)
model (formely deserialize) - takes the params and returns a model +which is set to the route’s currentModel
serialize - used to generate dynamic segments in the URL from a model
setupController - takes currentModel and sets it to the controller’s +content by default
renderTemplate - takes current controller and what model returns and +renders the template with an appropriate name
deactivate - executed when exiting the route (called by exit +internally)
exit (private, requires call to this._super)

Now let’s take a look at them in more detail

`activate`/`deactivate`

These were formely known as enter/exit, which are now marked as +private. activate will be executed when user enters a route, be it +from a transition or from a URL directly, and deactivate is executed +when user transitions away from the route.

One of the most common use cases for me is doing a transaction rollback +in deactivate.

App.PostsNewRoute = Ember.Route.extend({
+
+  deactivate: function() {
+    this.modelFor("postsNew").get("transaction").rollback();
+  }
+
+});
+

I find this mostly useful when having a new record form (or even when +editing a record), where you basically want to rollback any changes +which happened when the user exits the route. It doesn’t matter if the +user submits the form first, because then the transaction will be +comitted and there will be nothing to rollback.

`model`/`serialize`

To allow Ember to work with dynamic segments in the URLs we need to +teach it how to serialize and deserialize our models. When we enter a +URL directly (or reload the page) model will be called with params +from the dynamic segments. Let’s take a look at an example

App.Router.map(function() {
+  this.resource("post", { path: "/:post_id" });
+});
+
+App.PostRoute = Ember.Route.extend({
+
+  model: function(params) {
+    return App.Post.find(params.post_id);
+  }
+
+});
+

This is exactly what Ember will auto generate for us, along with a +serialize hook

App.PostRoute = Ember.Route.extend({
+
+  model: function(params) {
+    return App.Post.find(params.post_id);
+  },
+
+  serialize: function(model) {
+    return { post_id: model.id };
+  }
+
+});
+

it is important to note here that if we’re transitioning from a +different route our model hook will not be called.

`setupController`

One step further after model comes setupController, which is meant +to set additional properties on the controller, or override it’s +content.

But beware, there is no autogenerated setupController hook which sets to content, +this is done even before setupController is called in the setup hook of the route. This is basically simulates the following:

setupController: function(controller, model) {
+  controller.set("content", model);
+}
+

But it also means we can set additional properties on the controller +without needing to explicitly set the content

setupController: function(controller, model) {
+  controller.set("foo", "bar");
+}
+

`renderTemplate`

The last one of the hooks is renderTemplate where you tell which +template you want to render in which outlet.

By default renderTemplate will call this.render as follows

App.PostRoute = App.Route.extend({
+
+  renderTemplate: function() {
+    this.render("post", {
+      into: "application",
+      outlet: "main",
+      controller: "post"
+    });
+  }
+
+});
+

In this case render will render the post template into the +application template’s main outlet with the PostController.

This is the place where you can chose to render into other outlets. For +example let’s say that your application template has a sidebar outlet +{{outlet sidebar}}.

App.PostRoute = App.Route.extend({
+
+  renderTemplate: function() {
+    // render with the defaults
+    this.render();
+
+    // and once more for the sidebar outlet
+    this.render("similarPosts", {
+      into: "application",
+      outlet: "sidebar"
+    });
+  }
+
+});
+

Important notes about `controllerFor` and `modelFor`

While calling controllerFor("posts") returns an instance of +PostsController, calling modelFor("posts") doesn’t return +content +of the PostController. Instead it looks up the PostsRoute and +returns it’s currentModel which is set when we return a value from the +model hook.

Let’s see an example

App.PostsRoute = Ember.Route.extend({
+
+  setupController: function(controller) {
+    controller.set("content", App.Post.find());
+  }
+
+});
+

This will cause issues if we decide to use modelFor later on. +PostsRoute will not have anything in currentModel and modelFor +will return undefined, which might look weird as the controller has a +content properly set.

Ember.js: Using Transactions in Ember Data - part 1

Sat, 02 Feb 2013 14:15:00 +0000

We talked about transactions in one of the previous articles +(read it if you haven’t already), but we didn’t really touch on when to +use them in real world. One of the most common use cases for me is when +I just want to manage a single record while there are many changes +happening on the page.

Adding a record to a transaction is simple

+ + +

// say that we are in a controller
+store = this.get("store");
+
+// this ALWAYS returns a new transaction
+transaction = store.transaction();
+
+user = App.User.find(1);
+transaction.add(user);
+
+transaction.toString(); // => "<DS.Transaction:ember955>"
+

Now this is obvious, but what if we need to commit the transaction in a +completely different action? Do we need to store the instance somewhere +to use it later?

The answer is NO, we can always return the transaction in which the +record is by calling .get("transaction"). We can even do it if we +decide to fetch the user again in a completely different part of the +application.

user = App.User.find(1);
+user.get("transaction").toString(); // => "<DS.Transaction:ember955>"
+

It doesn’t matter in which part of the application you add the record to +a transaction because you can always retrieve the correct instance +later.

Which allows us to do something like this:

App.UsersNewRoute = Ember.Route.extend({
+  model: function() {
+    var transaction = this.get("store").transaction();
+
+    var user = transaction.createRecord(App.User, {});
+    return user;
+  },
+
+  events: {
+    createUser: function(user) {
+      user.get("transaction").commit();
+    }
+  }
+});
+

Personally I use this when I only care about one record, but I know that +there might be other which are dirty and I don’t want to commit those. +This happens almost every time you have two forms displayed at once.

Ember.js: Router and Template Naming Convention

Fri, 01 Feb 2013 19:43:00 +0000

Ever since the change to resource and route a lot of people are +confused about the meaning of the two and how they affect naming. Here’s +the difference:

resource - a thing
route - something to do with the thing

Let’s say we have a model App.Post and we want to show a list of posts +and a new post form. There are many ways you can go about this, so let’s +start with the simplest.

+ + +

App.Router.map(function() {
+  this.resource("posts", { path: "/" });
+  this.route("new", { path: "/new" });
+});
+

This would result in the following template structure

<script type="text/x-handlebars" data-template-name="posts">
+  ... list the posts
+</script>
+
+<script type="text/x-handlebars" data-template-name="new">
+  ... new post template
+</script>
+

With the following naming

PostsRoute
+PostsController
+PostsView
+NewRoute
+NewController
+NewView
+

Here’s a JSBin

This is almost never useful, since you might have many /new actions +and you’d need to scope them to the resource, which would be done as +follows

App.Router.map(function() {
+  this.resource("posts", { path: "/" }, function() {
+    this.route("new", { path: "/new" });
+  });
+});
+

Here things get a little more complicated, since we’re nesting something +inside the resource. This means that we’ll end up with three templates +instead of two

<script type="text/x-handlebars" data-template-name="posts">
+  <h1>This is the outlet</h1>
+
+  {{outlet}}
+</script>
+
+<script type="text/x-handlebars" data-template-name="posts/index">
+  ... list the posts
+</script>
+
+<script type="text/x-handlebars" data-template-name="posts/new">
+  ... new post template
+</script>
+

With the following naming

PostsRoute
+PostsController
+PostsView
+
+PostsIndexRoute
+PostsIndexController
+PostsIndexView
+
+PostsNewRoute
+PostsNewController
+PostsNewView
+

Here’s a JSBin

This means whenever you create a resource it will create a brand new +namespace. That namespace will have an {{outlet}} which is named after the +resource and all of the child routes will be inserted into it.

There are many reasons behind it, but let’s try another example which +will make it more obvious. We will add a /:post_id and +/:post_id/edit routes.

App.Router.map(function() {
+  this.resource("posts", { path: "/" }, function() {
+    this.resource("post", { path: "/:post_id" }, function() {
+      this.route("edit", { path: "/edit" });
+    });
+
+    this.route("new", { path: "/new" });
+  });
+});
+

Additional to the routes in the previous example, this will give us

// IMPORTANT - it's not PostsPostRoute, because `resource`
+// always creates a new namespace
+PostRoute 
+PostController
+PostView
+
+PostIndexRoute
+PostIndexController
+PostIndexView
+
+PostEditRoute
+PostEditController
+PostEditView
+

Templates are named accordingly post, post.index and post.edit, +there is nothing like posts.post.index or posts.post or +posts.post.edit.

Here’s a JSBin

But the problem is when we try to access the App.Post model from the +post/index or post/edit template. It is only available in the post +template with the outlet. Now why is that?

Since we are defining a resource it is expected that the child routes +will be related to that resource, that’s why they don’t need to load +it separately. They can access it from the parent PostController via +needs (more about that can be found in this article

Here’s a JSBin

This is the general pattern you would be using if you want to nest +everything. But what if you don’t want to render post into the +posts outlet? Well nothing prevents you from defining the routes as +this.

App.Router.map(function() {
+  this.resource("posts", { path: "/" }, function() {
+    this.route("new", { path: "/new" });
+  });
+
+  this.resource("post", { path: "/:post_id" }, function() {
+    this.route("edit", { path: "/edit" });
+  });
+});
+

What is the difference? The naming remains exactly the same as in the +previous example, even templates are named the same. But the post +template will be inserted into the application layout, not inside the +posts layout. This is the case when you want the detail post page to +replace the whole layout, instead of just showing it together with the +posts list.

I hope the examples will help you understanding how the v2 routes work, +since this is a completely essential part of Ember.js.

If you have any questions, leave them in the comments or tweet me +@darthdeus.

Ember.js: How to find a model by any attribute in Ember.js

Thu, 31 Jan 2013 23:13:00 +0000

One of the common things people ask about Ember Data is how to find a +single record by it’s attribute. This is because the current revision +(11) only offers three methods of fetching records

App.User.find(1) // returns a single user record
+App.User.find({ username: "wycats" }) // returns a ManyArray
+App.User.findQuery({ username: "wycats" }) // same as the above
+

+ +

If you want to search for a user by his username, you have two options

Using .find with smart server side

The way App.User.find(1) works is that it does a request to +/users/1, which is expected to return just one record.

You could modify your server to accept both username and id on the +/users/1 path, which would allow to do App.User.find("wycats").

There’s an issue with this though. If you load the same user via his +username and id, you’ll end up with two records stored in the Ember +identity map.

Which basically means that if you try to retrieve all of the user +records, you will end up with that one user twice.

If you want to read more about this, checkout this GitHub +issue

Using a findQuery

This might not seem like the right solution at first, since it returns a +DS.ManyArray instead of just one record, but hang on.

DS.ManyArray is a subclass of DS.RecordArray, which includes a +DS.LoadPromise.

To understand how DS.LoadPromise works, we need to understand what +promises are. There’s a great article about +that, so I won’t go into much detail.

Promise is basically an async monad (I guess that doesn’t help, let’s +try again).

Promise is something which allows you to return an object which wraps +around a value, even if you don’t have the value yet. For example if +you’re doing App.User.findQuery, you’ll get back an empty +DS.ManyArray instantly.

It doesn’t wait until the AJAX request is finished, it just returns the +empty array, which is populated with the data once the request finishes.

This works because Ember uses data bindings and will automagically +update all of the views once the data is loaded. And also because the +router will wait if it’s model has a state isLoading. That way you +won’t display a page which is half loaded.

Implementation

Now that we know we’re getting a DS.ManyArray, we need to figure out a +way to make it represent only the value of it’s first element, because +that’s what we care about.

var users = App.User.findQuery({ username: username });
+
+users.one("didLoad", function() {
+  users.resolve(users.get("firstObject"));
+});
+
+return users;
+

You can see that we are returning the result of the findQuery +instantly, but we’re also setting an asynchronous callback which +resolves the promise to the firstObject once it is loaded.

Another way you could read the resolve(x) is from now you’re +representing value x. Using this technique will work in all Ember, +because the data bindings will take care of everything. Always remember +that you don’t need to worry about re-rendering your views, just change +the data and Ember will take care of the rest.

Ember.js: Controller, ObjectController and ObjectProxy

Sun, 27 Jan 2013 19:24:00 +0000

When you first come to Ember, you’ll soon stumble upon three things:

Ember.Controller
Ember.ObjectController
Ember.ArrayController

For some people (including me) it is not very clear what’s the +difference between the first two.

Ember.Controller is just a plain implementation of +Ember.ControllerMixin, while Ember.ObjectController is a subclass of +Ember.ObjectProxy. This is a huge difference! Let’s take a look at how +Ember.ObjectProxy works, and as always starting with a code sample +(taken from the excellent source code documentation).

object = Ember.Object.create({
+  name: "foo"
+});
+
+proxy = Ember.ObjectProxy.create({
+  content: object
+});
+
+// Access and change existing properties
+proxy.get("name") // => "foo"
+proxy.set("name", "bar");
+object.get("name") // => "bar"
+
+// Create new "description" property on `object`
+proxy.set("description", "baz");
+object.get("description") // => "baz"
+

There is really no magic. In the basic usage, Ember.ObjectProxy will +delegate all of it’s unknown properties to the content object, with +one exception.

If we try to set a new property on a proxy while it’s content is +undefined, we will get an exception.

proxy = Ember.ObjectProxy.create();
+proxy.set("foo", "bar"); // raises the following exception
+

Cannot delegate set('foo', bar) to the 'content' property
+of object proxy <Ember.ObjectProxy:ember420>: its 'content' is undefined.
+

I’ve stumbled upon this in one scenario, where I didn’t set content for +my ObjectController, but I tried to modify one of it’s properties. +Raising the exception is a good example of failing fast, rather than +silently swallowing errors.

This being said you should almost always use Ember.ObjectController +over Ember.Controller, unless you know what you’re doing :)

Ember.js: State Manager and Friends - part 1

Sun, 27 Jan 2013 19:23:00 +0000

Since state management is such a huge part of Ember.js it desrves a +dedicated article. I’m not going to explain the old router which used +Ember.StateManager to do it’s bidding. Those days are over and we +should all be moving towards the v2 router (or v2.2 so to speak). +Instead we’re going to go deep into the Ember.StateManager.

In the general concept, state manager is basically some object which +manages states and the transitions between them, thus representing a +finite state machine.

+ + +

Let’s say we have a Post which can be in two states, draft and +published. It begins it’s life as a draft and when we publish it, +it should send out a notification email. The way Ember would handle this +is that it would assign a Ember.StateManager instance to the Post +instance and have that manage it’s state (that’s not exactly true in +Ember Data, but we’ll get into that).

For now let’s just say that this is the code we have

PostManager = Ember.StateManager.extend({
+  states: {
+    draft: Ember.State.create(),
+    published: Ember.State.create()
+  }
+});
+
+Post = Ember.Object.extend({
+  title: null,
+  init: function() {
+    this.set("stateManager", PostManager.create());
+    this._super();
+  }
+});
+

This gives us a really basic implementation. I’m setting the +stateManager property in the init function to avoid sharing the +instance across multiple Post instances. I’ll explain this in a +followup article, for now just remember that if you need to set a +property to an object instance, you have to do that in the init +function, not directly like stateManager: PostManager.create().

OK, we are now ready to list all of the states a Post can have.

post = Post.create();
+post.get("stateManager.states"); // => { draft: ..., published: ... }
+
+post.get("stateManager.currentState"); // => null
+

We forgot to say which of the states should be the default. Let’s +do that.

PostManager = Ember.StateManager.extend({
+  initialState: "draft",
+  states: {
+    draft: Ember.State.create(),
+    published: Ember.State.create()
+  }
+});
+

From now every single post we create will be a draft

post = Post.create();
+post.get("stateManager.currentState.name"); // => "draft"
+

And we can also make it transition into another state

post = Post.create();
+post.get("stateManager").transitionTo("published");
+post.get("stateManager.currentState.name"); // => "published"
+

But Ember.StateManager can do more than that. We can hook into both +enter and exit events on each state and do some magic! Let’s +redefine our state manager as this

PostManager = Ember.StateManager.extend({
+  initialState: "draft",
+  states: {
+    draft: Ember.State.create(),
+    published: Ember.State.create({
+      enter: function() {
+        console.log("post was published");
+      }
+    })
+  }
+});
+
+post = Post.create();
+post.get("stateManager").transitionTo("published");
+// console prints "post was published"
+

Understanding how this class works is essential for any Ember developer, +as it is being used in almost every part of the framework. We’ll take at +some specific examples in the second part of this artcile.

Ember.js: Concatenated Properties

Sun, 27 Jan 2013 18:18:00 +0000

As some of you might now, Ember provides you with something called +concatenated property. Their main use case is internal, which means +you are unlikely to have the need to use them in your own application. +There are some places in Ember where you might be surprised by how +things behave and this might be one of those. Let’s start with an +example.

+ + +

App.UserView = Ember.View.extend({
+  classNames: ["user"]
+});
+
+App.UserView.create().get("classNames") // => ["ember-view", "user"]
+

Now you might be asking, where is the "ember-view" coming from? Time +for another example

App.DetailUserView = App.User.extend({
+  classNames: ["more", "detail"]
+});
+
+App.DetailUserView.create().get("classNames") // => ["ember-view", "user", "more", "detail"]
+

This must be some sorcery! It seems that classNames aren’t overwritten +in the subclass, but rather concatenated to the superclass’ value of +that property. This works even when you overwrite it in an instance.

Ember.View.create({ classNames: ["cat"] }).get("classNames") // => ["ember-view", "cat"]
+

A simple glance at the Ember.View source code reveals it’s secrets

Ember.View = Ember.CoreView.extend({
+
+  concatenatedProperties: ['classNames', 'classNameBindings', 'attributeBindings'],
+
+  // more stuff
+

If this still doesn’t make any sense to you, just go take a look at the +tests for concatenated properties.

Ember.js: Ember Data in Depth

Sun, 27 Jan 2013 13:52:00 +0000

This is a guide explaining how Ember Data works internaly. My initial +motivation for writing this is to understand Ember better myself. I’ve +found that every time I understand something about how Ember works, it +improves my application code.

Main parts

First we need to understand what are the main concepts. Let’s start with +a simple example.

+ + +

App.User = DS.Model.extend({
+  username: DS.attr("string")
+});
+

Let’s dive deep into this. There are four important concepts, two of +which are basic Ember.js and we’re going to skip them

App.User represents a User class in the App namespace
username represents a property on the User class

These are the basics and you should be familiar with them to understand +the rest of this guide. Next we have DS.Model and DS.attr:

DS.Model and DS.attr

DS.Model is one of the core concepts in Ember Data and it represents a +single resource. Models can have relationships with other models, +similar to how you’d model your data in a relational database. But let’s +ignore that for now.

DS.Model is both a state machine and a promise. If you don’t +understand what promises are, please take a look at this awesome +article which explains them in depth.

State machines are used throughout Ember and they basically represent something which can have multiple states and can transition between the states. For example DS.Model can have the following states (taken from the official Ember guide):

isLoaded - The adapter has finished retrieving the current state of the record from its backend.
isDirty - The record has local changes that have not yet been saved by the adapter. This includes records that have been created (but not yet saved) or deleted.
isSaving - The record has been sent to the adapter to have its changes saved to the backend, but the adapter has not yet confirmed that the changes were successful.
isDeleted - The record was marked for deletion. When isDeleted is true and isDirty is true, the record is deleted locally but the deletion was not yet persisted. When isSaving is true, the change is in-flight. When both isDirty and isSaving are false, the change has been saved.
isError - The adapter reported that it was unable to save local changes to the backend. This may also result in the record having its isValid property become false if the adapter reported that server-side validations failed.
isNew - The record was created locally and the adapter did not yet report that it was successfully saved. +isValid No client-side validations have failed and the adapter did not report any server-side validation failures.

We can also bind to these with event handlers, which will be explained later, but for now let’s just list them:

didLoad
didCreate
didUpdate
didDelete
becameError
becameInvalid

I would also encourage you to go take a look at the source documentation on GitHub

It is important for us to understand what each state means, because they +can affect how our application behaves. For example if we try to modify +a record which is already being saved, we will get an exception saying +something like this

Attempted to handle event `willSetProperty` on <App.User:ember1144:null>
+while in state rootState.loaded.created.inFlight. Called with
+{reference: [object Object], store: <App.Store:ember313>, name: username}
+

The important part here is the rootState.loaded.created.inFlight. If +we look at the source of DirtyState, we can see what this means

+
Dirty states have three child states:
+
+
uncommitted: the store has not yet handed off the record to be saved.
+
inFlight: the store has handed off the record to be saved, but the adapter has not yet acknowledged success.
+
invalid: the record has invalid information and cannot be send to the adapter yet.
+
+

Let’s go through the record lifecycle and observe it’s state. We can do +this by doing .get("stateManager.currentState.name")

user = App.User.find(1)
+user.get("isLoaded") // => true
+user.get("isDirty") // => false
+user.get("stateManager.currentState.name") // => loaded
+
+user.set("username", "wycats")
+user.get("isLoaded") // => true
+user.get("isDirty") // => true, which means comitting the transaction will save the record
+user.get("stateManager.currentState.name") // => uncommitted
+
+user.get("transaction").commit()
+// while the record is being saved
+user.get("stateManager.currentState.name") // => inFlight
+user.get("isSaving") // => true
+// after the record was saved
+user.get("stateManager.currentState.name") // => saved
+

Transactions and `commit()`

In the previous example, we’ve used get("transaction").commit() to +persist the changes to the server. .commit() will take all dirty +records in the transaction and persiste them to the server.

A record becomes dirty whenever one of it’s attributes change. For +example

user = App.User.find(1)
+user.get("isDirty") // => false
+user.set("username", "wycats")
+user.get("isDirty") // => true
+

If we create a new record, it will be dirty by default

user = App.User.createRecord()
+user.get("isDirty") // => true
+

Currently there’s a regression +that we change an attribute to something else, and then back to the +original value, the record will be marked as dirty.

user = App.User.find(1)
+originalUsername = user.get("username")
+
+user.get("isDirty") // => false
+user.set("username", "wycats")
+user.get("isDirty") // => true
+user.set("username", originalUsername)
+user.get("isDirty") // => true, even though it should be false
+

But let’s hope this will be fixed soon.

Transactions

Until now we assumed that there is some global transaction which is +the same for every single model. But this doesn’t have to be true. We +can create our own transactions and manage them at our will.

I recommend you take a look at the tests for transactions in Ember Data +repository. +They basically show all of the scenarios which you can encounter. For +example

transaction = store.transaction();
+record = transaction.createRecord(App.User, {});
+
+transaction.commit(); // this will save the record to the server
+
+record.set("foo", "bar");
+transaction.commit(); // nothing is committed here, because the record
+                      // is removed from the transaction when it is saved
+
+store.commit(); // this will save the record properly
+

We can also add a record to a transaction, which will remove it from the +global transaction. Important thing to note here is that +store.transaction() +always returns a new transaction.

user = App.User.find(1);
+transaction = store.transaction();
+transaction.add(user);
+
+user.set("username", "wycats");
+
+store.commit(); // nothing happens
+transaction.commit(); // user is saved
+

Same goes for deleting records

user = App.User.find(1);
+transaction = store.transaction();
+transaction.add(user);
+
+user.deleteRecord();
+
+store.commit(); // nothing happens
+transaction.commit(); // user is deleted
+

We can also remove a record from a transaction

user = App.User.find(1);
+transaction = store.transaction();
+
+transaction.add(user);
+transaction.remove(user);
+
+user.set("name", "wycats");
+
+transaction.commit(); // nothing happens
+

One scenario when transactions can be useful is when you just need to +change one record, without affecting changes to other records. You can +put that change in a separate transaction, instead of just doing +store.commit().

Important thing to note here is that there’s a defaultTransaction for +the store to which you can get via store.get("defaultTransaction"). +This is where all of the records are placed, unless you explicitly +create a new transaction and assign a record to it.

These two are completely equivalent

store.commit();
+store.get("defaultTransaction").commit();
+

Just take a look at how store.commit() is defined

commit: function() {
+  get(this, 'defaultTransaction').commit();
+},
+

`commit()`

Now that we understand how transactions work, let’s dig deep into +store.commit(). First thing we need to understand here is that Ember +Transactions use this thing called bucket to store records with +various states in. This is first initialized in the init method of +DS.Transaction

init: function() {
+  set(this, 'buckets', {
+    clean:    Ember.OrderedSet.create(),
+    created:  Ember.OrderedSet.create(),
+    updated:  Ember.OrderedSet.create(),
+    deleted:  Ember.OrderedSet.create(),
+    inflight: Ember.OrderedSet.create()
+  });
+
+  set(this, 'relationships', Ember.OrderedSet.create());
+}
+

Each bucket represents one state in which a record can possibly be. +These are used in many different places in the transaction, and every +time a method changes it’s state, it will be moved to a corresponding +bucket

recordBecameDirty: function(bucketType, record) {
+  this.removeFromBucket('clean', record);
+  this.addToBucket(bucketType, record);
+},
+

More content will be coming soon

Ember.js: Controller's Needs Explained

Sun, 27 Jan 2013 11:53:00 +0000

Since the v2 router came it became clear that using global singleton +controllers like App.userController = App.UserController.create() is +not the way to go. This prevents us from doing a simple binding like

App.UserController = Ember.ObjectController.extend({
+  accountsBinding: "App.accountsController.content"
+})
+

+ +

There is no need or even possibility to manage the controller instances +with the new router though. It will create the instance for us. One way +we can use this is with this.controllerFor, which can be used inside +of a route.

App.UserRoute = Ember.Route.extend({
+  setupController: function(controller, model) {
+    // some magic with `this.controllerFor("user")`
+  }
+})
+

but since this method is only available on the route and not inside a +controller, it wasn’t very pleasant to specify dependencies (or needs) +between controllers. Which is exactly where needs come in and solve the +issue

App.UserController = Ember.ObjectController.extend({
+  needs: ["foo"]
+});
+

this will give you the opportunity to call controllers.foo on the +App.UserController instance and get back an instance of +App.FooController. You could even (ab)use that in the templates like +this

<!-- inside `users` template -->
+{% raw %}{{controllers.foo}}{% endraw %}
+

Needs vs routing

Needs become incredibly useful when you have nested routes, for example

App.Router.map(function() {
+  this.resource("post", { path: "/posts/:post_id" }, function() {
+    this.route("edit", { path: "/edit" });
+  });
+});
+

In this case we will get post, post.index and post.edit. If you go +to /posts/1 you expect to get post.index template, which is true, +but the context (or model, or content) is being set on the +PostController, not on PostIndexController.

When you think about it it does make sense, because the resource is +basically shared between post.index and post.edit, that’s why it is +fetched and stored in their parent. Let’s go through this in detail:

visit /posts/1
router basically does App.Post.find(1) and assigns that to the +content of PostController
template post is rendered
template post.index is rendered in post’s outlet

and when you transition to /posts/1/edit, the only thing that changes +is the leaf route, you still keep the same App.Post model, because it +belongs to the parent PostRoute, not to the leaf PostIndexRoute. But +this has a drawback. You’re not able to directly access the content from +the post.index template, since it doesn’t belong to it’s controller. +That’s where needs come in.

App.PostIndexController = Ember.ObjectController.extend({
+  needs: ["post"]
+})
+

and in the post/index template, you can access the content like this

{% raw %}{{controllers.post.content}}{% endraw %}
+

By specifying the need Ember will make sure that it gives you the right +PostController instance with it’s content set to the right value.

Jakub Arnold's Blog

requestAnimationFrame and useEffect vs useLayoutEffect

useEffect

useEffect cleanup

pausing and timing issues

Conclusion and references

References

SSH Tunnel - Local, Remote and Dynamic Port Forwarding

Background

Local Port Forwarding

Remote Port Forwarding

Dynamic Port Forwarding

Conclusion

Git Command Overview with Useful Flags and Aliases

git status

git add

git branch

git commit

git cherry-pick

git checkout

git diff

git fetch

git log

git merge

git push

git reset

git rebase

git remote

git stash

git show

Conclusion

Eigenvalues and Eigenvectors: Basic Properties

Eigenvalues and eigenvectors of a projection matrix

Eigenvalues of a $2 \times 2$ permutation matrix

Computing eigenvalues and eigenvectors

Eigenvalues and eigenvectors of an upper triangular matrix

Diagonalization $\boldsymbol S^{-1} \boldsymbol A \boldsymbol S = \boldsymbol \Lambda$

Sum of eigenvalues equlas the trace

Powers of a matrix

More properties

References and visualizations

Mixture of Categoricals and Latent Dirichlet Allocation (LDA)

Mixture of Categoricals

Bayesian Mixture of Categoricals

Latent Dirichlet Allocation (LDA)

Inference in LDA

References

Posterior Predictive Distribution for the Dirichlet-Categorical Model (Bag of Words)

Likelihood

Prior

Posterior

MAP estimate of the parameters

Posterior predictive

Posterior predictive for single trival Dirichlet-Categorical

Posterior predictive for a general multi-trial Dirichlet-Multinomial

References

Maximum Likelihood for the Multinomial Distribution (Bag of Words)

Dirichlet-Categorical Model

Multinomial coefficients

Categorical distribution

Multinomial distribution

Dirichlet distribution

Dirichlet-Categorical Model

Beta Distribution and the Beta-Bernoulli Model

Beta-Bernoulli model

The Gaussian Distribution - Basic Properties

Affine property

Sampling from a Gaussian

Sum of two independent Gaussians is a Gaussian

Deriving the normalizing constant

Deriving the mean and standard deviation

Graphical Models: D-Separation

Tail-tail

Head-tail

Checking marginal independence

Head-head

D-separation

Examples

Variational Inference - Deriving ELBO

ELBO using Jensen’s inequality

`useEffect`

`useEffect` cleanup

`git status`

`git add`

`git branch`

`git commit`

`git cherry-pick`

`git checkout`

`git diff`

`git fetch`

`git log`

`git merge`

`git push`

`git reset`

`git rebase`

`git remote`

`git stash`

`git show`

Implementation using basic `Array` and `Number` types

Implementing `get`, `set` and `clear` on a 32-bit vector

Implementing `get`, `set` and `clear` on an arbitrary length bit vector

Using Uint32Array instead of an `Array` of `Number`