vore microcomputers logo

The Simple but Awful Result of Making a Miscalculation: Week 3 of VoreStation Development

2023-08-27

written by Niko Chow-Stuart (Lead Developer)

drawing of the vap mascot attempting to lift a heavy briefcase, with papers falling out. the briefcase is labeled 0.7 KiloBytes.

quick note before i start: as mentioned on our social media pages, we've decided to change the (official) rate of these blog posts to one every two weeks. this is mainly because i feel that i'm not able to accomplish enough work within a single week to warrant a blog post, in addition to the fact that we've already been missing the weekly deadline for the past two weeks.
anyways, onto the actual post!

almost the instant after posting the previous blog post, i posted a follow-up on my personal social media pages, stating that i had made a miscalculation in the design of the filesystem. unfortunately, this miscalculation was a rather large one, and as such, i've had to spend the past two weeks reworking the parts of the filesystem that were affected by it.

as you may have already seen, the miscalculation in question was regarding the design of direct and indirect block pointers in the inodes. originally, i had decided on a design of either 12 direct blocks, or 9 direct blocks and 3 indirect blocks. however, after a very short amount of running the numbers through a calculator (which should've been done way earlier), i realised that this design would (assuming a relatively average block size of 512 bytes) limit the maximum file size to less than a megabyte. this is obviously undesirable, and as such, i decided to rework things to be a bit more future-proof.

the new design is a bit more complex, in that it essentially allows for an unlimited number of direct blocks and indirect blocks. the way that this works is that each indirect block can specify (for each pointer) whether the pointer is an indirect block or a direct block. this means that the maximum file size is now limited by the maximum number of indirect datablocks, which theroetically could be up to around 2^64 blocks, although reaching this limit would require an incredibly large amount of storage space.

one potential issue with this design is that it requires a lot of work to index the blocks of a file. say you wanted to access the second block of a file, you would have to read the first indirect block, then depending on the pointer type of the first pointer listed, either skip it (if its a datablock), or read that datablock just to figure out if the second pointer is stored in that datablock or the next one. this also would mean that you'd need to essentially do this forever until you reach the datablock you want, which is a big performance hit. however, i've decided to add one extra feature to the pointer listing to account for this.

each pointer is accompanied by a u8 value, which is either 0 or 1. depending on that value, the pointer either points to a datablock or an indirect block. importantly, immediately after that u8 comes a 56-bit integer (thus in total, flags + u56 is a u64), which contains absolute number of datablocks that this indirect block points to (undefined if its not an indirect pointer). this means that, if you want to access the second block of a file, you can simply read the first indirect block, and then skip the first pointer if you know that the second block is not stored in that indirect block (i.e. if the u56 value is less than or equal to 1). this should hopefully drastically improve performance, while keeping the design relatively simple. (either that or i'll regret my confidence in however long until i can test it).

in an ideal implementation, blocks are layed out by attempting to create a tree of indirect blocks that is as balanced as possible. this means that, if you have a file that is 5 blocks long (assuming a block size that can contain only two blocks for the purpose of this example), the blocks will be layed out like this:
a graph showing the layout of blocks. one block is at the top, with two blocks below it. data blocks 1 and 2 are below the first block, data blocks 3 and 4 are below the second block. data block 5 is attached directly to the end of the root block
please note my uncertainty when i say this, but i'm pretty sure that this is how the implementation i've written works. in practice, it may actually add a useless indirect block in between the root and datablock 5, but i'm not sure. i'll have to look over the code again (and probably rewrite it to be better).

overall, this is pretty much all the work i've done over the past two weeks. trying to get the idea of this into my head was somewhat tricky at first, and as such that took me quite a bit of the implementation time. notably, a big issue with this implementation is that files can no longer easily be non-entirely updated via a journaled multiblock write. this is because of the complexity of this structure, and i may end up either changing the way the multiblock write works or adding another type of write to account for this.

all changes to the source code should be public by the time this is up, as always you can view it at our git.

that should be it for this week's blog post! i'm hoping that i'll be able to get back on track with the development of the filesystem, and hopefully by next time i'll have gotten the B-Tree implementation for directories working! (i'm not sure if i'll actually be able to get it working by then, but i'll try my best!)

donations to our libera pay are of course appreciated as always (:

besides that, i'll see you all in two weeks!


image of xenia, an anthropomorphic fox who was a contender for the linux mascot image made by @cathodegaytube!