Rita is an audio graph editor I've been working on this week, written with Rust on the backend, and React, Typescript, and TailwindCSS on the frontend. The idea for Rita is that you'll be able to create graphs to process audio signals and create sounds. What differentiates Rita from other DSP or sound synthesis tools is that Rita aims to be a visual graph interface, in the same vein as Blender's material graphs or Unreal Engine's visual programming graphs. With a visual interface, anyone - musicians, music producers, people just getting interested in DSP - can create effects without having to become a programmer. As of now, Rita is in a very basic state, but there's enough at this point to talk about the architecture I've setup and the tech stack I'm using.
I've never been much of a frontend person, and so my original plan was to use something like egui or imgui which would handle most of the UI
layout and styling work. But imgui's Rust bindings are a bit subpar, and the only graph editor add-on for egui I could find is archived.
It became clear quickly I was going to need to use something from the much-larger web ecosystem. Thankfully, Tauri is a framework made for desktop apps using Rust on the backend and HTML+CSS+Typescript
on the frontend. Rita is now using ReactFlow, which has been really easy to use: the library handles all node and edge state, and makes it simple to add
connectors to nodes (using the Handle
element). It also serializes itself to JSON which can be easily sent to the Rust backend. The only tricky part of ReactFlow has been having
to disable some of its built-in styling for my own.
Speaking of styling, I'm using TaliwindCSS for general styling and shadcn/ui for most components. shadcn/ui is something I haven't used before this project, but it's
been great for quickly building out the UI. What's unique about shadcn/ui is that it isn't built as a traditional library but more so meant to be copy/pasted into your codebase. This makes it simple to
make changes to its components to your liking: I was able to modify the Card
component, for example, to support the colored headers for the graph nodes.
Getting the JSON data for the graph is just a matter of calling JSON.stringify(rfInstance.toObject())
, where rfInstance
is a state value holding the ReactFlowInstance. When the user hits compile, the frontend invokes a Tauri command, sending this JSON to the backend. From here, processing the graph
is done in 3 parts: parsing the JSON into more usable data, processing the graph one node at a time, and then sending the final buffer of audio samples to the playback thread.
JSON parsing is done using serde, which involves defining Rust structs that conform to how ReactFlow's JSON is laid out. These types are found in graph_json.rs.
We then use this JSON data to create our AudioGraph
struct:
pub struct AudioGraph {
graph: petgraph::Graph<AudioGraphNode, AudioGraphEdge>
}
enum AudioGraphNode {
Input {
file_path: String
},
WaveGen {
wave_type: WaveType,
frequency: f32,
amplitude: f32,
seconds: f32
},
BinOp {
operation: BinOp,
on_short_a: ShortBehavior,
on_short_b: ShortBehavior
},
Output {
final_buffer: Vec<f32>
}
}
struct AudioGraphEdge {
from_idx: usize,
to_idx: usize
}
#[derive(Clone)]
enum EdgeData {
SoundBuffer {
buf: Vec<f32>
}
}
Above is the definiton for the AudioGraph
struct, with the definitions of AudioGraphNode, AudioGraphEdge,
and EdgeData
for the full picture. Right now, AudioGraph
only holds a petgraph::Graph
field, which stores all the nodes and edges. AudioGraphNode
is (as you can probably guess)
the node type of the graph, and stores all the parameters a node may need (minus any inputs that come from other nodes) for processing.
At this point you may be wondering where graph inputs and outputs are stored, and why AudioGraphEdge
is, somewhat unintuitively, storing two usize
s and nothing more.
A key design choice I made was that AudioGraphNode
s should not store their inputs or outputs; the vectors containing inputs and outputs are instead passed in when it comes time for a node to be processed.
This is what AudioGraphNode::process
's signature is:
pub fn process(&mut self,
window: &tauri::Window,
my_idx: NodeIndex,
inputs: &mut HashMap<NodeIndex<u32>, Vec<(NodeIndex<u32>, usize, usize)>>,
outputs: &mut HashMap<NodeIndex<u32>, Vec<Option<EdgeData>>>,
output_spec: F32FormatSpec
)
To get the simple stuff out of the way, window
is passed in to send status
messages to the frontend using Tauri events; my_idx
tells the node what
index it should be using for inputs
and outputs
, and
output_spec
has info telling the node the sample rate and number of channels
its final sound buffer should have.
inputs
has an entry for each node (hence the NodeIndex
as a key). The tuples are in the form of (output_node_index, from_idx, to_idx)
:
the node's (to_idx
)th input is the (output_node_index
)'s (from_idx
)th output. This is what the AudioGraphEdge
is storing - in AudioGraph::process
, we use this edge info
to create the inputs
hashmap that eventually gets passed to all these nodes. Similar to inputs
, outputs
has an entry
for each node; and the ith Option<EdgeData>
is that node's ith output. Is this probably over-complicated and could be simplified? Yeah; but I'm probably going to be rewriting how inputs and outputs
are handled when I switch to a streaming model (see below), and if I do stick with this way of storing them, it has the benefit that we can very easily support multiple inputs, multiple outputs, and (pretty sure) one output being used in more
than one input.
Here's what AudioGraph::process
- the 'main' method for graph processing - looks like. We topologically sort the graph (so nodes will always have their inputs ready before being processed), create the
inputs
and outputs
maps, and then process the nodes one by one.
pub fn process(&mut self, window: &tauri::Window, output_spec: F32FormatSpec) -> Result<Vec<f32>, ()> {
send_status(window, "Beginning graph processing.");
// get sorted order
let sorted = match petgraph::algo::toposort(&self.graph, None) {
Ok(order) => order,
Err(_) => {
send_status(window, "Error sorting graph - cycle exists");
return Err(());
},
};
let _ = window.emit("set_total_nodes", sorted.len());
// create inputs map
let mut inputs: HashMap<NodeIndex<u32>, Vec<(NodeIndex<u32>, usize, usize)>> = HashMap::new();
for idx in &sorted {
let mut vec: Vec<(NodeIndex<u32>, usize, usize)> = Vec::new();
for edge in self.graph.edges_directed(*idx, Incoming) {
vec.push((edge.source(), edge.weight().from_idx, edge.weight().to_idx));
}
inputs.insert(*idx, vec);
}
// create outputs map
let mut outputs: HashMap<NodeIndex<u32>, Vec<Option<EdgeData>>> = HashMap::new();
for idx in &sorted {
let output_num = self.graph[*idx].num_outputs();
let vec: Vec<Option<EdgeData>> = vec![None; output_num];
outputs.insert(*idx, vec);
}
// Processing (the fun step)
let _ = window.emit("set_completed_nodes", 0);
for (i, idx) in sorted.iter().enumerate() {
self.graph[*idx].process(window, *idx, &mut inputs, &mut outputs, output_spec);
let _ = window.emit("set_completed_nodes", i + 1);
}
// Find output and return the final buffer
for idx in &sorted {
match &mut self.graph[*idx] {
AudioGraphNode::Output { final_buffer } => {
return Ok(std::mem::replace(final_buffer, Vec::new()));
},
_ => continue
}
}
Err(())
}
This code is all available in graph.rs. Right now the audio graph is what I would call an "ahead-of-time" processor: its final output is the full sound buffer, start to end. For short sound buffers, this is fine, and the graph
will finish processing in less than a second. When a graph starts using wav
files however, compilation becomes very slow and the final sound buffer takes up a lot of memory. I've been using
a wav
file of Marvin Gaye's "What's Going On" during testing (great song! give it a listen) and compilation takes about 10 seconds and
over 100 MB of memory; and that's just one song! I'm planning to rewrite the AudioGraph to use a just-in-time/streaming model, where the sound buffer is processed a chunk at a time instead of in one whole go.
The functionality for audio playback is split into two structs: Player
and Process
. They're both initialized in Player::new
:
pub fn new() -> Player {
// ... boilerplate cpal code ...
let (app_to_player_send, app_to_player_recv) = RingBuffer::::new(256);
let mut process = Process::new(app_to_player_recv, player_to_app_send);
let stream = device
.build_output_stream(
&config,
move |data: &mut [f32], _: &cpal::OutputCallbackInfo| {
process.process(data);
},
move |err| {
panic!("Audio ouput stream failure: {}", err);
},
None,
)
.expect("Stream building failed");
stream.play().expect("Stream playing failed");
Self {
stream,
format,
app_to_player_send,
player_to_app_recv,
}
}
Audio playback is a realtime process: something like reading a file or a memory allocation taking longer than usual can cause very noticeable glitches or skips in the audio, which is why
most playback libraries (such as cpal, the library I'm using) run the actual output-buffer filling code in a seperate thread.
Thus, the returned Player
struct is kept on the app thread, and can be used to pass messages to the Process
struct that exists on the stream thread.
The message types are below (I omitted the messages that are unused as of now), and are passed between the two structs with a ringbuffer.
pub enum AppToPlayerMessage {
Play,
Pause,
UseBuffer(Vec)
}
Using the Player
, the app can tell the process to pause or play the current sound buffer, or swap it out for a new buffer (such as one returned from AudioGraph::process
) to start playing from.
Process:proces
(which gets called when the audio device needs new sound samples) will check for any incoming messages, and then just fill the output buffer with the nessecary samples:
if let Some(current_buffer) = &self.current_buffer {
if self.current_buffer_idx + data.len() >= current_buffer.len() {
let count_at_end = current_buffer.len() - self.current_buffer_idx;
let count_at_start = data.len() - count_at_end;
unsafe {
std::ptr::copy(current_buffer.as_ptr().add(self.current_buffer_idx), data.as_mut_ptr(), count_at_end);
std::ptr::copy(current_buffer.as_ptr(), data.as_mut_ptr().add(count_at_end), count_at_start);
}
self.current_buffer_idx = count_at_start;
} else {
unsafe {
std::ptr::copy(current_buffer.as_ptr().add(self.current_buffer_idx), data.as_mut_ptr(), data.len());
}
self.current_buffer_idx += data.len();
}
}
I did a quick benchmark to compare using iteration to fill the data
buffer vs. ptr::copy
, and I found ptr::copy
faster enough to warrant using unsafe
for it. The Process
struct has a current_buffer
field which is what it's reading samples from, and a current_buffer_idx
to keep track of where it should start reading.
As I said a bit earlier, the main change I want to immediately make is switching to a just-in-time model to reduce loading times and memory usage. I'm heading back to college in a few weeks and likely won't have much time to work on this; my hope is that with this change and some upgrades to the UI, Rita would be in a nice place to gradually add new nodes or just come back to it when I have more free time.
The full code for Rita is available on GitHub. If you have any feedback, suggestions, questions, feel free to reach out!