Behind the Scenes: Doge

Almost a year ago while playing a game and talking to someone on Discord, I saw the in-game overlay that Discord provides to its users. I had been using Discord for years by then, but only then did I realize how convenient it is. It shows people in your voice channel, who's speaking and who's not, so you know who's that one person with their mic inside their nose, who's muted, live, etc. The problem with convenient things like that is that once they're gone, the "normal" flow of things feels inconvenient. This means once you close the game, the overlay is gone. No more top-of-the-screen live information no matter what you're doing. Every time you want to see something in your VC, alt+tab to Discord, scroll to your VC, just to see you have the wrong server open. I don’t like an inconvenience, especially when it can be mitigated, so I set out to make an overlay for Discord. One that works on top of every window, all the time, no matter what you're doing.

About a year later, this adventure resulted in a convenient utility that is now called Doge, a global overlay for Discord. And it had been an interesting journey, from an idea to a product, from knowing nothing to displaying VC information in an overlay. Now I want to take you through that journey, with the same excitement that I did. This post is slightly geared towards technical readers, although others will be able to read most of it just fine. Skim the technical sections if you feel like it. To reduce hyperlink clutter in the text, all images (except an obvious few) are clickable and point to the shown site. Brace yourselves, we’re taking off on this crazy journey, one step at a time.

Talk or Spy?

The core objective here is to be able to retrieve (live) information about the current user's VC (if any). There are usually two ways to achieve this. First, what I call the "talk" method, is to interact with Discord, "talk" to it via its API, be it public or private, or use a client library, or an SDK if available, to get the required information. Basically, any method that can be considered "official" or "legal". This was a viable solution because I could just make a Discord bot, add it to the server with the user's VC and listen for channel events.

Oftentimes, there won't be any viable method to just "talk" to the app or service, maybe due to the lack of an understandable API, maybe because it's too complex, maybe it requires some authentication too complicated to replicate, or maybe it just does or uses something only it knows. In cases like these, we can cheat a little. Because we know the app or service is not friendly, instead of trying to "talk" to it, we can "spy" on it. In our case, we could maybe get a screenshot of Discord's window and analyze it to get the required information. Repeat this process every few seconds and you've got a live information source. Maybe we could hook a DLL into Discord's process and listen for window events. Sounds too complex? Because it is. Another way is to inject code into Discord to add some functionality, or one of the many other crazy things you can do to spy on it, rather than talk.

Clearly, spying is very complicated. And it makes sense, you're going out of the way to spy on an app to get information that it doesn't want to share willingly. Of course, it will be complicated. So, one way to go about it is to find ways to talk with it and give up if you don't find any, disregarding the very existence of a method like spying. It's understandable because sometimes it just might not be worth it to go that far. Deciding whether or not it makes sense to spy depends on the specific situation. So, in my case, did I choose to talk or spy?

GlobalOverlay: The Bot

Seeing how I could just make a bot and listen to channel events, the choice was pretty simple (not quite, as we'll see soon), so I made a bot. It was named GlobalOverlay, pretty self-explanatory. And it worked as expected, well, most of the time. It was still buggy, but I didn't spend too much time fixing it because I didn't have the motivation. Why? Simple, because there was one big problem with this approach. The bot needed to be in the server with the user's VC. And while that might be possible for personal and friend servers, it is far from ideal for public servers. Plus, it only worked for me, multiple people couldn't use it simultaneously.

I had made an app in the past called VCNotifier that notified me of VC events via Windows toast notifications and used a Discord bot to listen for channel events. I just had to change the way it notified me of events (overlay instead of notifications) and it was done.

Seeing that it didn't turn out to be a viable solution at all, now what?

Talking Without a Bot

Having understood that relying on a bot to provide channel information wasn’t viable, I now had to search for another way of talking to Discord, something that doesn’t use the bot API, something that doesn’t have Discord bots at all. And this is when things started getting interesting. Because there does exist a way, it’s just, let’s say, obfuscated. I’m not sure if it’s intentional or not, but you’ll soon know what it is.

So I started hunting for a method in Discord’s documentation, something that would allow me to talk to Discord without a bot. Luckily, I did find something!

Discord RPC

Going through the documentation, I found out about Discord’s RPC.

RPC stands for Remote Procedure Call. Simply put, it’s like calling a function (or procedure), but in a different app, hence the word remote. You talk to the other app by calling its functions via RPC giving them any required arguments, and the functions carry out their actions, optionally returning some data. Similar to calling a function like getChannelInformation(), but in another app.

The way it works is that the Discord client, which is the Discord app running on your computer, hosts an RPC server that listens for other programs wanting to call functions in Discord. You talk to Discord by calling those functions (commands), optionally getting back some output. The commands exposed functionality to see the current user's VC, other people in it, whether they're muted, deafened, etc, who’s speaking and who’s not, and many other things. It also provided events you could subscribe to, that notified you when someone joined, left, etc. In other words, I could get live information from Discord reliably, by just talking to it. As you’d expect, this is exactly what I needed. It was a win!

All of this sounds too good to be true, right? That I found such a perfect solution so quickly, and I could just use it without any further problems. Well, perhaps unsurprisingly, it was indeed too good to be true.

It was in private beta, a way of saying “we’re still testing this feature and it is only available to us”. I tried to follow the docs and do what they said, but the first thing you have to do is, obviously, connect to the RPC server, which unfortunately rejected all requests. I tried to tweak them and experiment a little but to no avail. No program could talk to Discord via its RPC. And that was it, No more motivation for a global Discord overlay. Discord gave me a sliver of hope just to disappoint me soon after.

But wait, that “Playing VALORANT” on Discord…

So, there was no way to interact with Discord. Okay, accepted. But then I saw these commands related to Rich Presence.

Rich Presence aka Activity, in Discord’s lingo, is that “Playing X” you see in a user’s profile when they’re playing a game, or the "Listening to Spotify" when they're listening to song. It's called presence or activity because it tells you about the user's current activity, and rich because it provides other related information, such as the current song being played, how much time has elapsed, the number of people in their party, etc. I use the words Rich Presence and activity interchangeably from now on.

What was surprising is that it shows some information that only the game would know, like the number of people in the current party, the name of the current map, and sometimes even buttons to join and spectate!

It makes sense that when I click that join button, it will signal the game on my PC, and I would send a join request, in-game, to that player. The only way that's possible, you guessed it, is if the game is talking to Discord. And it has to be via RPC because that’s what those activity-related commands and events did, that I saw in the RPC docs. This means there is still hope. If these games can talk to Discord via RPC, so can I, using those commands and events that the RPC provides.

(Yes, that guy is playing Call of Duty for about 16 hours.)

API, Gateway, RPC, and Game SDK

I went back to the docs, retried everything, and experimented even more, but still nothing. I scourged the docs for every little thing, but over time things just kept getting messier. Discord has multiple ways through which apps can talk to it, the bot API, which is how bots talk to it, the Gateway, which is how bots listen to real-time events like messages, and RPC, the topic of the past few paragraphs, which is how apps and games tell Discord about themselves and the lobby and the map and everything else. Interestingly, the docs had a whole section titled “Rich Presence”.

The section references yet another way to talk to Discord - Game SDK, the latest way to update a user’s activity. It also has a ton of other features (ask me how I know!). There used to be another method, using a library called Discord-RPC, but it's now deprecated and remains just as a legacy.

All these things, all these different ways to talk to Discord, yet I can’t find one simple thing, how exactly do apps tell about themselves to Discord? The documentation was a mess, had some outdated information, and didn't lead me to my solution at all, so I had to get my hands dirty and try the only way I had left.

Scavenging Source Code

You know what they say, “talk is cheap, show me the code”. So that’s what I did. Searching the docs didn’t help me, instead, it just confused me even more, bombarding me with information I didn’t really need. So, I had to start looking into the source code of libraries and apps that did manage to talk to Discord. I went through sources of multiple apps, most of them were extensions and plugins, like a VS Code extension for Rich Presence. I also read the source of Discord-RPC, the previously mentioned library from Discord itself that was deprecated in favor of Game SDK. On the GitHub page for Discord-RPC, there is something called a “Hard Mode” documentation which exposes the inner workings of the library, telling how exactly it communicates with Discord, allowing you to “roll your own client”.

The Hard Mode docs were interesting, giving a major insight into the internals of Discord-RPC and Discord’s RPC infrastructure in general.

Even though it was good, it wasn’t satisfactory. Following the docs just resulted in another dead end, because the first step, connecting to Discord’s RPC server and authenticating the current user, gave an error. I searched through the internet, forums, Q&A, and everywhere I could. I did find a post asking why Discord’s RPC is not released yet and related things, but it seemed like this was a just a niche topic, probably forgotten by Discord itself. I didn’t see many people interested in it, no one tried to ask questions about it, and no one tried to find out how to use it. Everything was already fading away, and this just felt like the end.

Going Deeper: IPC

⚠️ Warning: This section is a bit technical.

Demotivated and at the brim of defeat, I decided to do the only thing I had left, read the source code of Discord-RPC. I didn’t have much hope though, I had already tried about ten different methods, looked at tons of documentation, and gotten waves of hope followed by disappointment. Even the Hard Mode documentation, which felt like the answer to everything, lead to nothing. Looking at the code, of Discord-RPC and apps that managed to achieve what I want, I finally (finally!) found the part that actually communicates to Discord, through the RPC server. Even though it wasn't formally documented anywhere, it was easy to wrap my head around because nothing too complicated was going on in there. Here’s a summary of how it all works:

Because there might be confusion with the words client and server now, until the end of this section, "Discord" means the Discord client, "app", "your app", "an app" etc means the application using the IPC, and "user" means the human using Discord.

Discord opens a pipe named discord-ipc-0 on the user's device. The docs say it can be any pipe from discord-ipc-0 to discord-ipc-9, and sometimes it has to be, for example in the case of both Discord and Canary running on the same device, but I have found that it's always the first one (-0) if there's just one instance of Discord running. All communication happens through this pipe. It's open as long as Discord is running. The packets sent through it follow a protocol that wasn't documented anywhere and was only available in the sources as implementation.

There are five types of packets sent through the pipe, identified by the opcode present in them. Here's an overview ( <opcode>-<type>):

0-HANDSHAKE: The first packet sent by an app containing its client ID and some other data.
1-FRAME: The data packets. These contain the main data transferred throughout the communication.
2-CLOSE: Sent at the end to indicate that the app is disconnecting.
3-PING and 4-PONG: Even though it seems like these would be used as heartbeat, they're not, and as of right now they appear to be unused.

The normal flow is: the user starts Discord on their device, Discord opens a pipe, an app connects to the pipe and sends a HANDSHAKE packet, then communicates normally via FRAME packets until it's done, when it sends the CLOSE packet and disconnects. There are some other things to be aware of, as mentioned in the Hard Mode Documentation here.

The gist of this Inter-Process Communication (IPC) is in the FRAME packets, which I'll refer to as just "frames" or "messages" now. These frames contain all the data transferred in either direction, such as the commands that the app sends to Discord, responses to those commands, and events and associated information raised in Discord and dispatched to the app. The question now is - what is the format of all these messages? Luckily, the format used by the IPC is the one that would've been used by the RPC, as shown in the RPC docs. But since RPC never came out, we can talk to Discord via the IPC using almost the same code that would've talked to the RPC. The only difference is handling the non-FRAME packets.

At this point, I was content. I went along the only path I had left and found a solution. A proper way to talk to Discord and get (live) information from it. My motivation for making a global overlay returned and I was back on the line. An overlay seemed just around the corner. All I had to do was find a wrapper around the IPC, or more specifically, a .NET library (I was developing the overlay in C#. A library made for C# is actually made for a few other languages, grouped under the umbrella term ".NET languages". If it seems confusing to you, replace ".NET" with "C#").

A Wrapper for Discord's IPC: DiscordIPC

I went onto the internet once again searching for libraries that wrapped Discord's IPC and made it easier to use. Must be easy, right? We have libraries for pretty much everything now, it should take less than a minute to find one for this. Well, not actually. I was already expecting some difficulty finding one because this isn't JavaScript, but I gradually realized something else. Something much more.. weird. There wasn't a single library out there, not just for .NET but for any runtime or language that wrapped Discord's IPC. I searched through GitHub, blogs, forums, and everywhere imaginable, but still couldn't find any. And it confused me for a very long time. I couldn't believe that I couldn't find a library for any language that did such a basic task when there are useless libraries for even more trivial things.

On the other hand, I found hundreds of libraries and wrappers that allowed manipulating the Rich Presence of a user, using the same IPC (indirectly, through a deep tree of dependencies). Surprisingly all those libraries used just the commands and events related to Rich Presence and left all the others untouched. It baffled me for a long time before I could come to peace with the fact that I won't find anything.

It wasn't a dealbreaker because (perks of being a developer) I could just write my own. So I did. It was supposed to be a part of Doge, like a module. It started small, as always, but started growing larger and more complicated, to a point where I had to separate it into its own project. I had been somewhat planning to do it since the beginning because I figured someone else might want to use it for their own apps so I'll just make it generally available, and this was the trigger. I guess this is the right time to answer the question: I chose to "talk" to Discord.

ℹ️ Shameless self-promotion ahead

The library is now called DiscordIPC, written in C#, targets .NET Standard 2.0, available on NuGet and GitHub, wrapping the whole IPC instead of just the functionality related to Rich Presence. It is constantly changing, since Discord's RPC isn't out yet and the message formats keep changing. Nevertheless, it is stable enough to be used for moderate-sized projects. Feel free to suggest improvements or contribute :).

Doge: The Inception

Having done everything necessary, and with a library at hand to consume Discord's IPC, it was time to start making the overlay. I started thinking about how it would work, getting into the documentation to understand the format of the messages, thinking about the structure of the app and its code, and many other things a developer would do at the beginning of a new project.

Something that I learned during the development of DiscordIPC and Doge is how difficult and frustrating it is to debug asynchronous, and in general, concurrent code. It made me appreciate the debugger that Visual Studio gives you and the myriad of features at your disposal like the Parallel Stacks tab, Tasks tab, etc. It also really makes me respect the engineers behind this whole system and their incredible work to make debugging, and development, much easier for us developers. I also learned a lot about Visual Studio, .NET, nuances of C#, and many other things.

I had a few things about Doge in mind. One of the main objectives was to make the overlay look as close to Discord's overlay as possible. I like to think I achieved that, although you're the judge for that. But that also meant I had to replicate the functionality, including customizability, just like Discord. It does have settings similar to Discord's overlay settings, although it is a bit more flexible. As of writing this text, Doge isn't mature yet and has a long way to go. I'll be adding many more features, most of them related to customizing the overlay, some related to its working, making it faster and more seamless, improving UX, etc.

Some Thoughts on the UI

Along the way, I had some thoughts about the UI of the settings window. I considered replicating Discord's theme in the settings window, making it look like you're changing Discord's settings, but it felt like too much so I dropped the idea. I really wanted to make it look like a Windows 10 app, using the Fluent Design System. It's the signature UI design of Windows 10, it's what separates classical desktop apps' UI from Windows apps' UI. Built-in Windows apps like Settings use Fluent design. It is what makes an app look like a "modern" or "Windows [10]" app. Everything in this paragraph also applies to Windows 11, including the Fluent Design System. When I started this project, I didn't have Windows 11, I only installed it after a few months of its development. But my thoughts remain all the same.

For fellow .NET developers: I didn't know about WinUI when I began this project. I thought the only way to use Fluent UI in an app was through UWP. Almost midway I learned about WinUI 3, but it was still confusing, a new technology, and I was unsure if I should use it here. Now, as I'm writing this, with the release of .NET MAUI, and it using WinUI under the hood on Windows, I would say it's possible I'll move to WinUI sometime in the future (if I have the motivation to do it). Until then, Doge remains a classic WPF app.

Reason Behind the Name 'Doge'

Somewhere along the way, I had to name the project, something that makes sense and is easy to say. By the time I had decided on a name, it just remained something easy to say. There is a series of reasons behind the name Doge. Initially, I had planned to name it GlobalOverlay, and I did, for the bot-based solution. But by now it seemed a bit too general and lacked brand in it. GlobalOverlay changed to Global Overlay for Discord, which changed to its initials GOD, which inverted and became DOG, which reminded me of the famous Kabosu, or as many of you might know him, Doge. This famous creature gave birth to the most famous meme of 2013, a contender for the best meme of the decade, and even a cryptocurrency!

As you might've already guessed, DOG changed to Doge, and it stuck. It had a "meme-y" feel to it, relevant to its target community - Discord users, so it was a win-win situation. It seemed to be a perfect choice and I decided to name the project Doge.

App Testers and Whitelisting

Everything went normally for a while until the very end when realized a few things about "privileged" features that I'll be talking more about in this section. They're a little inconvenient for the user if the app hasn't been approved by Discord, such as in my case. Considering how Doge was made to mitigate that very thing, it's very ironic that this had to happen.

A bit before releasing Doge, I invited a person to test it, like a form of pre-release access. When that person tried to connect their Discord to Doge, it didn't show any errors but still failed to connect. I wondered about it for a while before realizing it might have something to do with Discord's "privileged" features. Discord's APIs (not just bot-related, but everything, be it RPC, gateway, or anything else) have some features that are not available to all apps in general. They require approval from Discord if the app wants to use them. One example is privileged gateway events that require Discord's approval for your bot to be able to use them once it reaches at least a hundred servers.

The whole process is described on Discord's FAQ here, but in short, once you feel like your app needs whitelisting, you reach out to Discord asking for their approval, they look into it and respond to you, and about a week later, if you're lucky and your app follows their guidelines, your app gets whitelisted. In the case of a bot, you can see a little tick mark on its bot badge.

Turns out some feature(s) that Doge uses, not exactly sure which one(s), is/are privileged. Unfortunately for me, it hadn't even been released yet, let alone become popular enough for Discord to consider approving it, so I shied away from trying to ask for approval. Going back to my early testing friend, how did they use Doge if it wasn't approved for accessing those features? And even more important, how was it working for me?

Apparently, when a Discord user makes an app, that is, of course, connected to their account, they can use the app without any problems, no matter what the app does or which APIs it accesses, and that's how it should be otherwise they would need to get an unfinished app approved before they can continue its development, not very sensible. But what happens when you want to expand your reach and bring your friends in to test your app? For that, you add them as app testers in your app.

You add them using their Discord username and discriminator, they get an email invitation for testing your app, they accept it, and now they can use the app just like you would. I added my friend as an app tester and now they were able to use it normally. It went on for a few weeks, they kept using Doge, a few flaws came into notice, and I kept updating it until it was time to formally release it. So now, if anyone else wanted to use it, they needed to be an app tester too. My apologies for making you go through all of that inconvenience, but until Doge (hopefully) gets verified, it'll remain that way. On the bright side though, it's just a one-time thing, and everything gets much more convenient after that.

However, like everything else in this journey of making a global overlay, it soon got worse. Shortly after I released the first version, I got a whitelist request from someone, so I entered their username to add them as a tester, and this is what I got:

I cannot describe in words how ecstatic I was after seeing this message :D. The username has been changed for obvious reasons, this image shows what I saw using a sample account. Now not only did they have to go submit a whitelisting request and accept the invitation, but they also needed to add me as a Discord friend prior to that. My friend testing it was already my Discord friend so I didn't realize this earlier. The whole process just kept getting more and more inconvenient and Doge became the very thing it swore to destroy.

As for my thoughts on all of this, I wish it would've been a bit frictionless for users, but it is what it is right now. I'm soon going to request approval for Doge, and this whole process would hopefully be irrelevant in the future. I understand this would impact user engagement and I'm trying to improve the situation to the best of my ability. Until then, you'd have to add me as a Discord friend and I'd have to make you an app tester for you to be able to use Doge.

Further Development

The rest of the development doesn't have much to tell. It was a pretty standard code-debug loop for several months, with occasional design and structural changes along the way. Most of the exploration and hurdles were in the part of communicating with Discord. The rest was just a GUI consuming the API provided by DiscordIPC. Of course, it's more complicated than that, but that's the gist of it.

As for the future, Doge will continue to be developed and improved, along with DiscordIPC, using new and better techniques. It will become more and more customizable and easy to use as time goes on. It's open-source so if you're willing to contribute, you're very welcome. But for the time being, I had achieved my goal, I had my beautiful, convenient, top-of-the-screen live information providing global overlay!

The End

About a year later, as I continue to write this, Doge is complete, tested, and released. It has made its debut onto, well, its GitHub repository. Instructions for usage and support are in the README. If you're a Discord user and like convenient things, Doge is just the right thing for you, do check it out.

With Doge developed and released, I'm afraid this is where we part ways reader. Thank you for coming with me on this journey, I hope you enjoyed it as much as I did, or probably a bit less because I was much more excited about it, I hope you liked it, the adventures, the hiccups, and the revelations. I hope you liked the story and learned something new. And I hope we meet again soon.

Cheerio!👋