What are some cool and obscure data structure you know of?

lil_shi@programming.dev · 1 month ago

What are some cool and obscure data structure you know of?

myfavouritename@beehaw.org · 1 month ago

I get way more use out of Doubly Connected Edge Lists (DCEL) than I ever thought I would when I first learned about them in school.

When I want to render simple stuff to the screen, built-in functions like ‘circle’ or ‘line’ work. But for any shapes more complicated than that, I often find that it’s useful to work with the data in DCEL form.

xthexder@l.sw0.com · 1 month ago

I came up with a kind of clever data type for storing short strings in a fixed size struct so they can be stored on the stack or inline without any allocations.
It’s always null-terminated so it can be passed directly as a C-style string, but it also stores the string length without using any additional data (Getting the length would normally have to iterate to find the end).
The trick is to store the number of unused bytes in the last character of the buffer. When the string is full, there are 0 unused bytes and the size byte overlaps the null terminator.
(Only works for strings < 256 chars excluding null byte)

Implementation in C++ here: https://github.com/frustra/strayphotons/blob/master/src/common/common/InlineString.hh

FizzyOrange@programming.dev · 30 days ago

Interesting idea, but your trick is never really going to help (you can store up to 255 bytes instead of 254). Also always using 256 bytes for every string seems wasteful.

I think LLVM’s small string optimisation is always going to be a better option: https://joellaity.com/2020/01/31/string.html

xthexder@l.sw0.com · 30 days ago

22 characters is significantly less useful than 255 characters. I use this for resource name keys, asset file paths, and a few other scenarios. The max size is configurable, so I know that nothing I am going to store is ever going to require heap allocations (really bad to be doing every frame in a game engine).

I developed this specifically after benchmarking a simpler version and noticed a significant amount of time being spent in strlen(), and it had real benefits in my case.
Admittedly just storing a struct with a static buffer and separate size would have worked pretty much the same and eliminated the 255 char limitation, but it was fun to build.

FizzyOrange@programming.dev · 29 days ago

22 characters is significantly less useful than 255 characters.

You can still use more than 22 characters; it just switches to the heap.

nothing I am going to store is ever going to require heap allocations

I would put good money that using 256 bytes everywhere is going to be slower overall than just using the heap when you need more than 22 characters. 22 is quite a lot, especially for keys. ThisReallyLongKey is still only 17.

xthexder@l.sw0.com · 29 days ago

I don’t use 256 bytes everywhere. I use a mix of 64, 128, and 256 byte strings depending on the specific use case.
In a hot path, having the data inline is much more important than saving a few hundred bytes. Cache efficiency plus eliminating heap allocations has huge performance benefits in a game engine that’s running frames as fast as possible.

FizzyOrange@programming.dev · 28 days ago

having the data inline

It’s not as simple as that, depending on the architecture. Typically they would fetch 64-byte cache lines so your 128 bytes aren’t going to be magically more cached than 128 bytes on the heap.

Avoiding allocations may help but also maybe not. This is definitely in “I don’t believe it until I see benchmarks” realm. I would be really really surprised if the allocation cost was remotely bad enough to justify the “sorry your file is too long” errors.

anton@lemmy.blahaj.zone · 28 days ago

I came up with a kind of clever data type for storing short strings in a fixed size struct so they can be stored on the stack or inline without any allocations.

C++ already does that for short strings while seamlessly switching to allocation for long strings.

It’s always null-terminated so it can be passed directly as a C-style string, but it also stores the string length without using any additional data (Getting the length would normally have to iterate to find the end).

Also the case in the standard library

The trick is to store the number of unused bytes in the last character of the buffer. When the string is full, there are 0 unused bytes and the size byte overlaps the null terminator.

Iirc, that trick was used in one implementation but discontinued because it was against the standard.

(Only works for strings < 256 chars excluding null byte)

If you need a niche for allocated string you can get to 254 but the typical choice seem to be around 16.

runeko@programming.dev · 29 days ago

Dewey decimal

JackbyDev@programming.dev · 29 days ago

B trees are cool but not obscure necessarily. I didn’t learn about them in college. It sounds like binary tree and it’s similar but it’s different. It’s a data structure to take advantage of the way disk reads work.

Trigger2_2000@sh.itjust.works · edit-2 29 days ago

AVL Trees are neat AVL Tree

QueenMidna@lemmy.ca · 29 days ago

I’ve been knee-deep in these lately so I’m a big fan

Theta sketches!

Do you want to approximately count a large volume of items, but save the state for later so you can UNION , INTERSECT and even DIFF them? Then Thetas are right for you!

Or basically anything in the Apache Datasketches lbrary.

litchralee@sh.itjust.works · 1 month ago

IMO, circular buffers with two advancing pointers are an awesome data structure for high performance compute. They’re used in virtualized network hardware (see virtio) and minimizing Linux syscalls (see io_uring). Each ring implements a single producer, single consumer queue, so two rings are usually used for bidirectional data transfer.

It’s kinda obscure because the need for asynchronous-transfer queues doesn’t show up that often unless dealing with hardware or crossing outside of a single CPU. But it’s becoming relevant due to coprocessors (ie small ARM CPUs attached to a main CPU) that process offloaded requests and then quickly return the result when ready.

xthexder@l.sw0.com · 30 days ago

One cool trick that can be used with circular buffers is to use memory mapping to map the same block of memory to 2 consecutive virtual address blocks. That way you can read the entire contents of the buffer as if it was just a regular linear buffer with an offset.

Trigger2_2000@sh.itjust.works · 29 days ago

Not really a data structure per say, but just knowing LISP and the interesting structures it uses internally.

The results of LISP operations CAR, CDR, CADR and the other one I can’t remember now.

tiredofsametab@fedia.io · 1 month ago

Not necessarily obscure, but I don’t think Tris get enough love

muzzle@lemmy.zip · 1 month ago

Tris?

catchy_name@feddit.it · 1 month ago

They likely meant tries / prefix trees.

tiredofsametab@fedia.io · 1 month ago

Yep. I can’t spell today

muzzle@lemmy.zip · 1 month ago

Trie!

duckythescientist@sh.itjust.works · 1 month ago

I’m also not sure if this is obscure, but Bloom Filters! It’s a structure that you can add elements to then ask it if it has seen the element before with the answer being either “no” or “probably yes”. There’s a trade-off between confidence of a “probably yes”, how many elements you expect to add, and how big the Bloom Filter is, but it’s very space and time efficient. And it uses hash functions which always make for a fun time.

FizzyOrange@programming.dev · 1 month ago

Obscure 10 years ago maybe. These days there have been so many articles about them I bet they’re more widely known than more useful and standard things like prefix trees (aka tries).

lad@programming.dev · 1 month ago

Relevant xkcd

in Randall's words

Sometimes, you can tell Bloom filters are the wrong tool for the job, but when they’re the right one you can never be sure.

Atlas_@lemmy.world · 29 days ago

Fibonacci heaps are pretty cool. Not used very often b/c they’re awful to implement, but better complexity than many other heaps.

Also Binary Lifting is closer to an algorithm than a data structure but it’s used in Competitive Programming a fair bit, and isn’t often taught: https://cp-algorithms.com/graph/lca_binary_lifting.html

And again closer to an algo tham a data structure, but Sum over Subsets DP in 3^n also has a cool little bit of math in it: https://cp-algorithms.com/algebra/all-submasks.html

idunnololz@lemmy.world · 30 days ago

I personally don’t think it’s that obscure but I have never seen this used in production code that I didn’t write: the linked hash map or ordered hash map.

3abas@lemmy.world · 29 days ago

I’ve used it twice that I can recall.

dustletter@piefed.blahaj.zone · 29 days ago

Skew binary trees. They’re an immutable data structure combining the performance characteristics of lists (O(1) non-amortized push/pop) and b-trees (log(N) lookup and updates)
They use a sequence of complete trees, cleverly arranged using skew binary numbers so that adding an element never causes cascading updates.
In practice they’re superseded by relaxed radix balanced trees.

xep@discuss.online · 1 month ago

How about this variation of linked lists? https://www.data-structures-in-practice.com/intrusive-linked-lists/

FizzyOrange@programming.dev · 30 days ago

Quite well known now because Rust-haters like to point out how they’re awkward to use in Rust.

Vorpal@programming.dev · 1 month ago

XOR lists are obscure and cursed but cool. And not useful on modern hardware as the CPU can’t predict access patterns. They date from a time when every byte of memory counted and CPUs didn’t have pipelines.

(In general, all linked lists or tees are terrible for performance on modern CPUs. Prefer vectors or btrees with large fanout factors. There are some niche use cases still for linked lists in for example kernels, but unless you know exactly what you are doing you shouldn’t use linked data structures.)

palordrolap@fedia.io · 1 month ago

An ultimately doomed one that existed in Perl for a while was the pseudohash. They were regular integer-indexed arrays that could be accessed as though they were hashes (aka associative arrays / dictionaries). They even made it into the main Perl books at the time as this awesome time saving device. Except they weren’t.

I did a quick web search just now and someone did a talk about why they weren’t a great idea and they tell it better than I could; Link: https://perl.plover.com/classes/pseudohashes/

The supplied video doesn’t have great sound quality, and it might be better to just click through the slides under Outline at the bottom there.