← Blog
Tech4 min read

Building LLM Applications for Companies When Their Data Is Private

Hello, I'm Egehan Yıldız, founder of an AI-education company called A Sip of AI, a senior at Bilkent University, and an on-demand freelance AI consultant. I won't bore you with a long intro, but apparently these days you have to share this kind of background for your writing to actually get read.

Even though I'll use this platform pretty rarely, I see it as a good chance to answer questions I hear constantly and to share my perspective on a few critical topics. Lately, several companies have approached people in my network about different LLM applications, and my close circle keeps asking how to build such systems for domains like law or defense.

Today I got the same question for the fourth time, and since AI applications and data privacy are among the most important issues of our time, it felt like the right moment to write this down.

The scenario

You're a developer, call yourself X. The owner of a company, Y, comes to you and explains their domain. Let's say it's the defense industry. They're just now laying the software foundation of the company, and they want their own LLM. They consult you and say:

"Mr. X, we're a defense company doing such-and-such work. Lately we've wanted to develop a company-specific LLM. But our data is extremely important to us, so we absolutely cannot share it with anyone. Could you handle this project for us?"

You reply, "Of course we can", and now you go looking for a solution.

The two solutions everyone thinks of

1. Rent a cloud server (AWS)

Get a server from a provider like AWS where you can develop, test, and deploy the company's LLM, database, compute, everything. It's the first thing that comes to mind.

The problem: the moment you work with a third-party provider, no matter how many contracts you sign, you can never be completely certain the data won't leak. And throughout both development and production, Y keeps paying for every operation you run.

2. Host it yourself

You, as X, are a developer, not a hardware provider. To build locally you'd need serious resources: a very powerful computer and real hardware. And once the app is in production, it has to run continuously as a background job, which shows up month after month as electricity costs.

So self-hosting usually isn't the move. If you don't already own a powerful machine, it's simply not feasible, and if you buy one, a server running around the clock still isn't free. These issues might be negotiable, but they're cumbersome all the same.

AWS is a fine option, but pay-as-you-go bills become a real headache unless Y is a large defense contractor. And if the data is truly sensitive, you can never be entirely sure how it's handled, contracts or not.

A feasible solution

So you took the project, now what? There's a simple answer: AWS has plenty of open-source competitors.

Ubicloud is one of them, a company in the Y Combinator ecosystem (W24 batch) that lets you use their server stack much like AWS. I have no affiliation with Ubicloud; it's just an example from my recent research, and there are certainly other open-source alternatives out there too.

Here's what you'd tell Y:

"Mr. Y, AWS won't work, for these reasons, and I can't host it myself, for these reasons. But we have a very good option: with an open-source AWS alternative like Ubicloud, we can configure your own servers to behave like AWS and make them remotely accessible to me or a dev team. So instead of using AWS, I work against your servers as if they were AWS. First, your data never leaves your environment. Second, I can still build the whole application end-to-end with the same AWS-like workflow."

And just like that, the data-privacy problem is solved at its root. You (X) are no longer the one providing hardware, you can focus on your service. Development continues remotely with a familiar, AWS-like workflow.

The only downside: if Y is new on the software side, they'll need to buy a strong machine and set up the hardware ecosystem. But they'd have to do that anyway to build a serious software foundation in their domain.

For more details

This turned out a little jargon-heavy. If anything isn't clear, you can always talk it through with an LLM like ChatGPT to get a better feel for what I mean by Ubicloud, AWS, and so on.

It sort of became a case study. And for what it's worth, this is a real situation, not a hypothetical, I just can't share the names. I'd had a fresh conversation about it earlier that day, so I wanted to write it down. Maybe it helps someone, or at least shows that AWS isn't the only option out there, and helps us appreciate the value of open source.

One more thing: I strongly recommend researching how open source actually makes money and what kind of business model it is. On most projects, I think the first question we should ask is: how can I solve this with open-source tools? That's a question every developer should be asking.

Until the next post, goodbye!

#ai#llm#data-privacy#open-source#infrastructure