Category Archives: 随便写写。

LaTeX note 1: how to draw B+ tree w/ TikZ

As promised, I should have updated here much more frequently. Well… things are sort of going out of control since I’ve started my grad journey. Anyways, I’m so ecstatic when getting a B+ tree straight out from LaTeX like this one below!!

I learned a great deal from this post but the B+ tree code provided does not resemble with the ones in the lecture notes (see an example below). That’s the ONLY post that I found online that talks about how to draw B+ tree in LaTeX. I’m aware that other talented people have also worked on this and provided .sty packages, but unfortunately that’s too much work and not for me 🙂 So here we go! A simple hands-on instruction on LaTeX B+ tree with TikZ!

↑ My guess is that the professor used special kind of software to draw this, otherwise it’s not easily achievable in LaTeX… based on my experience

What is B+ Tree?

I don’t want to make this post as an ancillary note of my course so won’t go into details here (pretending that I’m already an expert though I know nothing). Basically, it’s an index structure used in database management system which supports fast search and adjusts to database changes accordingly. Its predecessor is named B tree but it’s not in use anymore. You can find more details about B+ tree here (thanks Wiki!).

What is TikZ?

It’s a LaTeX package designed for graphics! It somehow reminds me of matplotlib from Python, but this one seems to be more powerful. This page contains basic examples to help you get started on TikZ (to be honest, I didn’t read it). There are some advanced graphs generated through TikZ as well; again, I only focus on the part that I’m interested in 🙂

What is LaTeX?

Please don’t ask me this, I supposed you already know or you won’t even have an interest in reading this article…

I’m in, what’s next?

To unlock all features from TikZ, simply add the package to your working tex project:

\usepackage{tikz}
\usetikzlibrary{positioning,shadows,shapes,arrows}

Let’s first draw a leaf node with order = 3 (that is, a node will contain 3 keys and 4 pointers – each key owns a pointer and the last one serves as the sequence pointer that connects leaves). I didn’t draw the sequence pointer out yet due to no other leaf existed at this moment:

\begin{center}
\begin{tikzpicture}
\tikzstyle{bplus}=[rectangle split,rectangle split horizontal, rectangle split parts = 3, rectangle split empty part width=0.01mm, rectangle split every empty part={\hspace{0.25cm}}, draw]
\tikzstyle{every node}=[bplus]
\node {2\nodepart{two}3\nodepart{three}5};
\end{tikzpicture}
\end{center}

So the node representation is defined by the \tikzstyle{bplus} statement, especially with the rectangle split parts. It directly determines how many slots would be available on a given node. Since the order is 3, I put 3 here. If the order is 5, I shall update the number to be 5 accordingly. The other two commands that involved “empty part” are simply stating that empty slot should be drawn. That is to match the format I learned in class 🙂

After running the code section above, you should get a picture like this:

Not… too bad right?

Well, suppose that we would like to insert 7 to the B+ tree. Based on definitions, it’s now time to split. Four elements will be placed on two nodes with two elements in each. There will also be a non-leaf node, where pointers to the left refer to records that are strictly less than the element (<5 in this case) and pointers to the right refer to records that are greater than or equal to the key (>=5 in this case). The code below does the trick for you:

\begin{center}
\begin{tikzpicture}[node distance=60mm]
\tikzstyle{bplus}=[rectangle split,rectangle split horizontal, rectangle split parts = 3, rectangle split empty part width=0.01mm, rectangle split every empty part={\hspace{0.25cm}}, draw]
\tikzstyle{every node}=[bplus]
\tikzstyle{level 1}=[sibling distance=60mm]
\node {5\nodepart{two}} [->;]
child {node (A) {2\nodepart{two}3}}
child {node (B) {5\nodepart{two}7}};
\draw [->] (A) -- (B);
\end{tikzpicture}
\end{center}

Notice that this time I specified sibling distance between nodes. That is, for nodes that are on the same level, how far you would like them to be apart. 60 mm is chosen based on Matthew Leingang‘s prior work 🙂 Feel free to change that to fit your need. There is an order in drawing nodes as well. Always draw the root node first (start with \node). Then draw the child nodes with the prefix child, enclosed by brackets. In order to show the sequence pointer, I also named the two nodes A and B and used the \draw functionality to connect them together. The final picture looks like this:

We are getting closer! Details on how to inserting more keys to the tree are omitted. Below is a sample code on how to draw the complete tree:

\begin{center}
\begin{tikzpicture}
\tikzstyle{bplus}=[rectangle split,rectangle split horizontal, rectangle split parts = 3, rectangle split empty part width=0.01mm, rectangle split every empty part={\hspace{0.25cm}}, draw]
\tikzstyle{every node}=[bplus]
\tikzstyle{level 1}=[sibling distance=60mm]
\tikzstyle{level 2}=[sibling distance=25mm]
\node {19\nodepart{two}} [->]
child {node {5\nodepart{two}11}
child {node (A) {2\nodepart{two}3}}
child {node (B) {5\nodepart{two}7}}
child {node (C) {11\nodepart{two}17}}
}
child {node {29\nodepart{two}}
child {node (D) {19\nodepart{two}23}}
child {node (E){29\nodepart{two}31}}
}
;
\draw [->] (A) -- (B);
\draw [->] (B) -- (C);
\draw [->] (C) -- (D);
\draw [->] (D) -- (E);
\end{tikzpicture}
\end{center}

I still haven’t figured out how to draw the pointers on the leaf level that point to actual records… but I believe this is good enough for the assignment purpose. (Feel like I typed a lot but didn’t really “teach” others how to draw it.) I wish that this post is helpful for fellow LaTeX users who may need to draw B+ tree as well. The code can be simply changed to adapt to more scenarios (more levels, higher orders etc). Feel free to let me know your thoughts on this!

Advertisements

浅谈如何提取 WordPress eXtended RSS (WXR) 中的信息

啊,然后我又很久没写博客了。明天要上班鸟,毕竟是第一份正儿八经全职工作,还是有点紧张外加期待的。这篇文也是拖了好久没写了,主要是在我分析完我的推特数据以后突然发现,我使用 WordPress based 的博客也有七年了(记录可以一直追溯到09年——再之前我也有用过yo2之类的 WP托管型博客,但是貌似我没有导出记录)。就突然心血来潮想看一下词云。(我知道WP有能提供词云的插件 ><)但托管在 WordPress.com 的博客有那么一点不一样,导出的数据格式不是普通的XML,而是WXR (WordPress eXtended RSS)。跟推特很好心的在打包的 zip file 里给你提供好 .csv 文件不一样,WordPress.com 的导出记录用文本编辑器打开来看就是长得很丑的……XML。所以就简单说下怎么提取数据啦。

Continue reading 浅谈如何提取 WordPress eXtended RSS (WXR) 中的信息

科学玩推特(3):基于主题模型的推文分析

真真是没想到,再次更新【科学玩推特】这个系列居然也是两个多月以后了。回美帝之后一度忙着找工,就没继续推文分析这项大工程。之前试图套 Topic Model (主题模型,下同)到推文上,但因为 Rwordseg 的中文分词太差劲所以就没弄成。好吧我总算是不再拖延了,也大概是学会了如何用搜索引擎吧= = 所有的源代码都可以在这里找到哦~

Continue reading 科学玩推特(3):基于主题模型的推文分析

香港没有旧情人

(我来占个坑,有空一定补完。不可避免的又是俗套的爱情小说了。题目灵感:聊天时候突然说到回国要飞哪个机场,对方说上海,我说香港。然后我突然接了句香港没有旧情人。灵感就突然来了……好吧,HK确实没有什么特别令人牵肠挂肚的人……所以显然是只能小说化了……敬请期待【泥垢【没人看你博客的好吗)

【怨妇口吻注意——】

Continue reading 香港没有旧情人