LaTeX note 1: how to draw B+ tree w/ TikZ

As promised, I should have updated here much more frequently. Well… things are sort of going out of control since I’ve started my grad journey. Anyways, I’m so ecstatic when getting a B+ tree straight out from LaTeX like this one below!!

I learned a great deal from this post but the B+ tree code provided does not resemble with the ones in the lecture notes (see an example below). That’s the ONLY post that I found online that talks about how to draw B+ tree in LaTeX. I’m aware that other talented people have also worked on this and provided .sty packages, but unfortunately that’s too much work and not for me 🙂 So here we go! A simple hands-on instruction on LaTeX B+ tree with TikZ!

↑ My guess is that the professor used special kind of software to draw this, otherwise it’s not easily achievable in LaTeX… based on my experience

What is B+ Tree?

I don’t want to make this post as an ancillary note of my course so won’t go into details here (pretending that I’m already an expert though I know nothing). Basically, it’s an index structure used in database management system which supports fast search and adjusts to database changes accordingly. Its predecessor is named B tree but it’s not in use anymore. You can find more details about B+ tree here (thanks Wiki!).

What is TikZ?

It’s a LaTeX package designed for graphics! It somehow reminds me of matplotlib from Python, but this one seems to be more powerful. This page contains basic examples to help you get started on TikZ (to be honest, I didn’t read it). There are some advanced graphs generated through TikZ as well; again, I only focus on the part that I’m interested in 🙂

What is LaTeX?

Please don’t ask me this, I supposed you already know or you won’t even have an interest in reading this article…

I’m in, what’s next?

To unlock all features from TikZ, simply add the package to your working tex project:

\usepackage{tikz}
\usetikzlibrary{positioning,shadows,shapes,arrows}

Let’s first draw a leaf node with order = 3 (that is, a node will contain 3 keys and 4 pointers – each key owns a pointer and the last one serves as the sequence pointer that connects leaves). I didn’t draw the sequence pointer out yet due to no other leaf existed at this moment:

\begin{center}
\begin{tikzpicture}
\tikzstyle{bplus}=[rectangle split,rectangle split horizontal, rectangle split parts = 3, rectangle split empty part width=0.01mm, rectangle split every empty part={\hspace{0.25cm}}, draw]
\tikzstyle{every node}=[bplus]
\node {2\nodepart{two}3\nodepart{three}5};
\end{tikzpicture}
\end{center}

So the node representation is defined by the \tikzstyle{bplus} statement, especially with the rectangle split parts. It directly determines how many slots would be available on a given node. Since the order is 3, I put 3 here. If the order is 5, I shall update the number to be 5 accordingly. The other two commands that involved “empty part” are simply stating that empty slot should be drawn. That is to match the format I learned in class 🙂

After running the code section above, you should get a picture like this:

Not… too bad right?

Well, suppose that we would like to insert 7 to the B+ tree. Based on definitions, it’s now time to split. Four elements will be placed on two nodes with two elements in each. There will also be a non-leaf node, where pointers to the left refer to records that are strictly less than the element (<5 in this case) and pointers to the right refer to records that are greater than or equal to the key (>=5 in this case). The code below does the trick for you:

\begin{center}
\begin{tikzpicture}[node distance=60mm]
\tikzstyle{bplus}=[rectangle split,rectangle split horizontal, rectangle split parts = 3, rectangle split empty part width=0.01mm, rectangle split every empty part={\hspace{0.25cm}}, draw]
\tikzstyle{every node}=[bplus]
\tikzstyle{level 1}=[sibling distance=60mm]
\node {5\nodepart{two}} [->;]
child {node (A) {2\nodepart{two}3}}
child {node (B) {5\nodepart{two}7}};
\draw [->] (A) -- (B);
\end{tikzpicture}
\end{center}

Notice that this time I specified sibling distance between nodes. That is, for nodes that are on the same level, how far you would like them to be apart. 60 mm is chosen based on Matthew Leingang‘s prior work 🙂 Feel free to change that to fit your need. There is an order in drawing nodes as well. Always draw the root node first (start with \node). Then draw the child nodes with the prefix child, enclosed by brackets. In order to show the sequence pointer, I also named the two nodes A and B and used the \draw functionality to connect them together. The final picture looks like this:

We are getting closer! Details on how to inserting more keys to the tree are omitted. Below is a sample code on how to draw the complete tree:

\begin{center}
\begin{tikzpicture}
\tikzstyle{bplus}=[rectangle split,rectangle split horizontal, rectangle split parts = 3, rectangle split empty part width=0.01mm, rectangle split every empty part={\hspace{0.25cm}}, draw]
\tikzstyle{every node}=[bplus]
\tikzstyle{level 1}=[sibling distance=60mm]
\tikzstyle{level 2}=[sibling distance=25mm]
\node {19\nodepart{two}} [->]
child {node {5\nodepart{two}11}
child {node (A) {2\nodepart{two}3}}
child {node (B) {5\nodepart{two}7}}
child {node (C) {11\nodepart{two}17}}
}
child {node {29\nodepart{two}}
child {node (D) {19\nodepart{two}23}}
child {node (E){29\nodepart{two}31}}
}
;
\draw [->] (A) -- (B);
\draw [->] (B) -- (C);
\draw [->] (C) -- (D);
\draw [->] (D) -- (E);
\end{tikzpicture}
\end{center}

I still haven’t figured out how to draw the pointers on the leaf level that point to actual records… but I believe this is good enough for the assignment purpose. (Feel like I typed a lot but didn’t really “teach” others how to draw it.) I wish that this post is helpful for fellow LaTeX users who may need to draw B+ tree as well. The code can be simply changed to adapt to more scenarios (more levels, higher orders etc). Feel free to let me know your thoughts on this!

Advertisements

2016·年终·展望·总结

卧槽感觉才写完15年的总结马上就又要写16年的了,这一年怎么过得那么快……不知道在哪里看到,说随着时光推移人会感觉时间过得越来越快,因为小的时候对这个世界还很陌生好奇,每天都被新鲜事物所刺激。大了见识到这世界的真相以后就没有那种心境了,只觉得每天都浑浑噩噩得过且过的能混一天是一天(笑)。有点不太知道自己今年应该写些什么,所以先放一下好了。可能我明天有感觉了就会过来尽情吐槽?(说来博客也少更了好多……主要是没空+懒+不知道要说些什么,毕竟我16年原本是打算把这个博客往技术向方向发展的 😛  好了我来填坑了,依照惯例,总结前一年发生了什么+希望下一年会发生什么。

Continue reading 2016·年终·展望·总结

浅谈如何提取 WordPress eXtended RSS (WXR) 中的信息

啊,然后我又很久没写博客了。明天要上班鸟,毕竟是第一份正儿八经全职工作,还是有点紧张外加期待的。这篇文也是拖了好久没写了,主要是在我分析完我的推特数据以后突然发现,我使用 WordPress based 的博客也有七年了(记录可以一直追溯到09年——再之前我也有用过yo2之类的 WP托管型博客,但是貌似我没有导出记录)。就突然心血来潮想看一下词云。(我知道WP有能提供词云的插件 ><)但托管在 WordPress.com 的博客有那么一点不一样,导出的数据格式不是普通的XML,而是WXR (WordPress eXtended RSS)。跟推特很好心的在打包的 zip file 里给你提供好 .csv 文件不一样,WordPress.com 的导出记录用文本编辑器打开来看就是长得很丑的……XML。所以就简单说下怎么提取数据啦。

Continue reading 浅谈如何提取 WordPress eXtended RSS (WXR) 中的信息