Adjusting scroll offset when document height changes
In an NSScrollView – and likely in scroll views in other systems – if the height of your document changes, you may need to adjust your scroll offset to compensate so that the content you see in your viewport doesn’t jump around.
Here are some ways your document height can change:
Laying out a paragraph that wasn’t previously laid out.
Changing the font size or zoom level.
Reloading the document due to a change on disk.
Editing the document. This is multifaceted. Consider:
Two views (windows, tabs, etc.) onto the same document.
Edits by someone else in a collaborative editor.
This is a surprisingly complex problem, and there’s no one-size fits all solution.
Different types of documents may call for different rules: in a drawing app, when you zoom, you may want to maintain the center point in your view. That is to say, what was at the center before the zoom should remain at the center afterwards.
In a text editor, you probably want to maintain the position of the upper left corner. But even this is more complicated than it first appears. If you’re zooming or resizing text, that will cause each paragraph to reflow, and text that was at the beginning of each line fragment (each visual line in a wrapped paragraph) will change. How do you even determine what the “same location” is?
From here on out, we’ll just be considering text editors.
There are two variables to pay attention to: the height of the document itself, and the scroll offset – how far down in the document we’re scrolled, usually measured from the upper left corner of your viewport (the part of the document you can see).
NSScrollView will not change the content offset in response to a change in document height. If the document height grows, your content offset will remain the same. This means the percent of the document you’ve scrolled through will go down. If you were half way through the document before, you may now only be 45% of the way through. If the document height shrinks, everything will be reversed.
It’s not enough to know that the height of the document has changed. You need to know where the height changed within the document.
If there’s a change in document height that occurs below your viewport, you won’t experience any ill effects. If the document grows, the scroll nub will get smaller and move up towards the top of the scroll bar, but there will be no visual glitches in the viewport. If the document is tall enough relative to the window size, the changes in the scroll bar may not even be visible.
If the change in document height occurs above your viewport, you will notice a glitch. Specifically, if the document height grows, it will be as if you had instantaneously jumped up towards the top of the document. The change is jarring.
The fix is to adjust the content offset by the amount that the height changed. If the document got 10px bigger, and it happened above your viewport, you need to increase the content offset by 10px, and then it will look like nothing changed. Easy.
The implementation details are a bit more complicated. When laying out a text document, the unit of work is generally a paragraph (i.e. the text between two newline characters). If your in a programmer’s text editor, you might call this a line even though it takes up multiple visual lines on the screen. This means the most basic info you might have is “the height of paragraph 23 changed from 50px to 60px”.
But even that isn’t enough to figure out if you need to adjust the content offset. If the bounding box of the paragraph is entirely above the viewport (paragraph.maxY <= viewport.minY), then it’s easy. Apply the adjustment.
If the paragraph contains viewport.minY, the question becomes: in which line fragment in the paragraph did the height change occur? If the line fragment is above the viewport, you need to apply the adjustment, but if it’s inside the viewport you shouldn’t. I thought you might have to specially deal with a line fragment that partially overlaps (i.e. contains viewport.minY), but it turns out that’s not necessary – if you can see any portion of the line fragment that caused the height change, you don’t apply an adjustment.
In short, if the height change occurred visually within the viewport or below it, don’t do anything. If the height change occurred above the viewport, apply the adjustment.
If you’re doing contiguous layout – laying out the document starting at the top all the way down through the viewport – scrolling up generally shouldn’t cause any layout, and your document height won’t change. But contiguous layout is expensive, especially for large documents.
Instead, you might do non-contiguous (i.e. viewport-based) layout. That is to say, only laying out the text that you can see in the viewport. This is a lot faster, but it means you don’t actually know the height of the document1, so you have to estimate. As you move the viewport and layout paragraphs, you can replace the estimated height of each paragraph with the actual height. This is how you can get in a situation where scrolling up will cause your document height to change: grab the scroll bar and quickly scroll to the bottom, missing some paragraphs, and then start scrolling up with your trackpad.
With collaborative editing, things get even weirder. Let’s say you have two carets in your viewport. The upper one is your friend’s, who’s editing the document remotely, and the lower one is yours. With the rules we have so far, if your friend starts typing, we won’t apply any adjustment. Every time your friend types enough to add another line fragment, everything below it will get pushed down, including your caret.
But we probably don’t want this! If you’re in the middle of writing, it would be pretty annoying to have your caret slowly move down the page while other people make edits. So here’s another rule: if you’re in a collaborative editing context, your carets are privileged over other people’s carets. If someone is typing into your viewport, and your caret is in the viewport too, we should anchor to the top of your caret instead of anchoring to the top of the viewport. That way, as your friend adds new line fragments, the paragraph they’re editing will seem to grow upwards, rather than down, and your caret will stay in the same location. To do this, we have to apply the adjustment any time a paragraph above your in-viewport caret is changed by someone else’s caret, even if the affected line fragment is inside the viewport.
Of course, if you don’t have a caret in the viewport, we go back to the original rules.
I don’t actually know if this is a good idea because I’ve never seen it in action. Furthermore, what about other types of changes in the viewport above your caret that don’t come from collaborative editing? I’m not sure! I’m not even sure what might cause those changes, but it’s worth thinking about.
There’s one more collaborative editing wrinkle: what if you’re in the same situation as above, but instead of a caret in the viewport, you have a selection? In that case, I think we should probably go back to the previous rule: your friend typing in the viewport will move your selection down with the rest of the text. I’m not even sure why I think this, and I could be wrong, but my gut says it’s right. Carets are more important than selections.
The good news is that if you’re not making a collaborative editor (I’m not), you can sidestep some of these issues. But having multiple cursors or having the same document open in multiple tabs, where each tab has its own scroll state, carets, etc., has many of the same complexities as collaborative editing – though I don’t think there’s anything analogous to our rule about your caret being more important than theirs.
Update 2023-07-24: There’s one obvious thing I forgot to add. You should be able to treat height changes due to layout of new paragraphs as if they came from edits in the first line fragment. That way you don’t need separate code paths for layout vs edits.
Technically, with contiguous layout you might not know the height of the document either, but you will know the exact height of the portion of the document that starts at the top and ends at the bottom of your viewport, which is enough to deal with the jumping when you scroll up. ↩︎
After over a week of banging my head against ropes, I finally feel like I’m over the hump, and things are getting easier.
I’ve been reading through Xi, swift-collections, and Jesse Grosjean’s SummarizedCollection. David Scot Taylor’s YouTube series on B-trees has been another good resource. Volume 3 of The Art Of Computer Programming also has a good description of B-trees on pages 482 - 489, but it’s dense, and some of the conventions – like how it defines the tree’s order and leaves being virtual and not actually existing – are different from most other materials that I found.
The big question I’ve been wrestling with, besides “what is a B-tree and how does it work,” is “how do I efficiently concatinate two B-trees, where all elements of the left tree are less than all elements of the right tree?” I had a hunch that there was a way to do this in O(log n) time, but the code in Xi and swift-collections was pretty inscrutable, and there aren’t many good resources online, probably because B-Trees are usually used for file system and database indexes where I think concatination is a rare operation. As a comparison, the naive way of doing it, by iterating over all the elements of the shorter tree and inserting them into the longer tree, is probably O(m log n), where m is the number of elements in the shorter tree, and n is the number of elements in the longer one. When both trees are huge, this is untenable.
Most of the libraries I looked at had a different way of doing the concatination. The one that ended up clicking was the way Xi does it, so I’m doing that. I’ll probably write up a description of how this works at some point soon.
The other thing that’s made this difficult is figuring out how to implement a copy on write collection in Swift. The convention in Swift is that everything, including collections, are passed by value, but multiple copies of a collection can share a reference to the same underlying storage. The storage is only copied when you write to the underlying array.
Caching rendered text
One of the mistakes I often make while programming is abstracting too soon. I’ll think “let’s solve this problem by implementing data structure X,” only to find that the interface X provides doesn’t actually let me solve the problem. The good news is I know that people have written text editors with ropes before, but I was still a bit nervous about whether I was going to finish building a rope, only to find that I can’t solve my problems with it. This morning while walking the dog, I wrote up some notes that convinced me I’m on the right track, and have reproduced them below.
Here’s some context that might make them more comprehensible:
A CALayer is part of Core Animation. For our purposes, you can think of it as a texture stored in GPU memory that gets composited into a window.
A CTLine is part of Core Text. It represents a line of text that’s been laid out and can be drawn into a canvas like a CALayer.
A Rope is a data structure for storing large amounts of text. It has O(log n) indexing, insertion, and deletion. Even better, it has O(1) copying because it’s immutable by default. Ropes get these properties by storing their data in trees.
Compare this to a standard string stored as a fixed-length array of characters, where all these operations are O(n)1.
In their original construction, ropes were balanced binary trees, where each leaf contained a portion of the full text, and each internal node contained the sum of the lengths of the text stored in its children.
These days, it seems most people build ropes out of B-trees2. B-trees are similar to self-balancing binary search trees, but instead of having only two children, each node can have N children, where N can be very large (e.g. 1024). Unlike AVL trees or red-black trees, which require a rebalancing step after insertion and end up only partially balanced, B-trees are always perfectly balanced by construction – all leaves are always at the same height, splitting when necessary. Rather than growing down from the leaves, B-trees grow up from the roots.
“Summary” means the data that’s stored in each internal node of the rope. By default, that’s total length of the text contained in the node’s children, but it can be other things too, like the number of newline characters in its children.
Here are my notes from this morning:
What am I trying to do?
Make the text editor fast.
Things I think are probably slow:
Drawing CTLines into a CALayer
Calculating line breaks (not actually sure how expensive this is)
I definitely want to cache the CALayers. I think caching line breaks is easy with ropes, so I’ll do it. I’m not sure if it’s necessary to cache CTLines separately from the layers that contain their rasterized contents.
To cache CALayers, I need to be able to invalidate the layers when the text in their paragraph changes, and remove layers that are no longer in preparedContentRect.
The obvious first attempt at invalidating layers when text changes is mapping textRange to layer.
If the paragraph is edited but doesn’t change length, we won’t invalidate.
if the paragraph changes length, we have to update the ranges of all following paragraphs.
I’m pretty sure I can deal with the first problem by doing the same two step edit that Xi does: when performing an edit, generate a set of deltas from the existing rope, and then apply them to create a new rope. We might be able to use these deltas to invalidate the new rope: I.e., an edit that leaves a paragraph the same length is a Replace(text in range) where length(text) = length(range), or alternatively a Delete(range) followed by Insert(text at range.start). Either way, we know that any layers in range should be invalidated from the cache.
I believe the second problem can be solved using a rope: if the summary is the length of each paragraph (rather than the paragraph’s range in the document), we should be able to find the associated layers for a given text range by walking down the tree in O(log n) time, and updates will still be be O(log n) time because lengths (unlike ranges) just require summing as we go up the tree rather than offsetting all the following ranges, which would be O(n).
I’m less certain about only caching what’s in the viewport. We need a method like invalidateLayers(outside: rect). If we store the heights of each paragraph in the cache, we should be able to quickly find the start index and end index of the rect and then get a sub tree containing only the inside.
I think this implies that we can’t have a single rope that stores all the stateful data. We need a separate rope just for the layer cache. We also need to duplicate either the line heights or the paragraph lengths in the layer cache, and we need to store an offset for the beginning of the layer cache that so that offsets in the layer cache (which represents just a portion of the document) are compatible with offsets in the line height cache (which represents the entire document). The reason we can’t store the layer cache in the same rope as the heights and other layout information, is that implementing invalidateLayers(outside: rect) without slicing the tree requires iterating over most of the tree to find the layers to invalidate. That’s O(n) while slicing is O(log n).
For deciding whether to cache the CTLines, the questions are:
How much time does it take to layout a CTLine?
Is there anything besides rendering the text that we need CTLines for?
How much memory to CTLines take up?
For the first question, I’ll have to measure. I know text shaping is a complicated process, but I wouldn’t be surprised if computers are fast enough that we don’t need to cache.
The reason for the second question is that we’ll already be caching the rendered text, so if we only need the CTLines for drawing, and we’re caching the results of drawing, there’s no reason to cache the CTLines too. The only thing I can think that we might need CTLines for is if we wanted to do something like calculating the height of the entire document in the background after loading it (this would give us the actual document height rather than just an estimated height). This is probably not necessary, at least to start. Assuming our font is monospaced, the number of glyphs per line is constant. We should be able to accurately estimate the height of each paragraph by doing glyphsInParagraph/glyphsPerLine × lineHeight.
For the third question, CTLines are pointers to a struct of unknown size, so the only way to figure this out is empirically by creating a ton of them and measuring memory usage. Given the answer to the second question though, I don’t think we’ll need to cache CTLines. Furthermore, even if we did want to calculate an fully accurate document height in the background, we could do that without caching the CTLines – just create one CTLine at a time, ask it for its height, and then throw it away.
You might think indexing in normal strings is O(1), but that’s only true for fixed-length encodings like ASCII or UTF-32. UTF-8, which is the most common text interchange format, and UTF-16, which is used by Core Text, are both variable-width, which means you need to scan the entire string to get to the nth code point or extended grapheme cluster. ↩︎
Technically they’re probably more like B+trees, which only store values in the leaves, not in the intermediate nodes, but B-trees and B+trees are both key-value stores, and ropes store, so neither analogy is totally accurate. ↩︎
I got text input working a while ago. Unsurprisingly, it didn’t take a day, like I estimated. More like three or four.
The big problem is that it’s slow. The reason is because of the way I’m doing document height estimation.
You need to know the height of your document to correctly render your scroll bars. Most generally, to know the height of a text document, you need to lay it out from top to bottom. But that’s expensive, especially for large documents. It’s cheaper to only layout the text you’re looking at, but then you don’t know your document height, and you’re back to square one.
To get around this, you can estimate the height of the document. Leaving out some of the details: I’m storing an array of heights, y-offsets, and character ranges for each line in the document. Each line starts with an arbitrary height estimate of 14 points, and I update the estimates to the actual height when I layout a line.
I’m arbitrarily estimating the height of each line to be 14 points, and then updating estimates as I layout the text. This is plenty fast, even though I have to update all of the following y-offsets every time I layout a line.
When I edit the text however, I have to update all of the following text ranges. This is slow enough that there’s noticable lag with each character typed, and all the slowness comes from String.index(_:offsetBy:) which takes an opaque String.Index and offsets it by an integer value. This is the case even though all my offset values are small – from the beginning of a line to the end of it. From some basic profiling, it seems that the majority of time is spent in [NSBigMutableStringIndex characterAtIndex:], specifically in some locking functions (os_unfair_lock_lock_with_options and os_unfair_lock_unlock). I don’t really know why this is, and it’s somewhat frustrating. In general, I find string processing in Swift to be opaque and frustrating.
My height estimation code is also unprincipled and hard to work with. The three or four days I spent on text input were almost entirely spent on updating height estimates correctly.
So armed with these problems, I ran head first into what is a probably bad idea: implementing a rope. A rope stores a string in a balanced search tree, and is able to do most operations including insertion and deletion in O(log n) time. Even better, you can store metadata (e.g. number of line breaks, estimated line heights, etc.) along with the actual characters of your string, and keep them up to date in O(log n) time as well. A year ago, I spoke with Raph Levien about this and he specifically recommended against using a rope because they’re complicated and hard to implement. But here I am.
For the past week, I have been banging my head against ropes and B-trees with little success. I need to change something about how I’m approaching this problem, but I’m not sure what yet. All I know is that reading books, papers and other people’s source code in an undirected way doesn’t seem to be working. Hopefully I’ll be able to find a better approach.
I’m writing a text editor. Until I can think of a better name, it’s called Watt.
Watt is for my own use. I want to have something that I can use as a daily driver, but that’s a tall order given that I’m used to having syntax highlighting, auto-complete, debugging, project-wide search, GitHub Copilot, etc. We’ll see how far I get.
One way I can simplify is by not supporting extensions. Because Watt is just for me, if there’s a feature I need, I can build it in.
I use a Mac, so Watt is a Mac app. A GUI editor is more complicated than a CLI editor, but it’s what I like, and it’ll give me an excuse to learn more about text layout, rendering, and editing. I’m using Core Text, Apple’s low-level text layout and rendering API. There are higher level frameworks, most interestingly TextKit 2, but after spending time with it last year, I think it’s a bit too buggy. Plus, using Core Text lets me learn more about text layout and rendering.
After ~3 weeks of work, Watt can render text into a scroll view at acceptable speed, and supports text selection and caret movement with the mouse. The next step is actually editing text.
I’m hoping to have text editing, including support for system-provided IMEs, done by tomorrow. That would normally be unrealistic, but I can scavange most of the code from last year’s TextKit 2 experiments.