Mama told me not to come.

She said, that ain’t the way to have fun.

  • 0 Posts
  • 58 Comments
Joined 1 year ago
cake
Cake day: June 11th, 2023

help-circle

  • You don’t have to convince me that Rust rocks. I just need to convince my team that it’s worth the investment in terms of time to onboard everyone, time to port out application, and risk of introducing bugs.

    We have a complex mix of CRUD, math-heavy algorithms, and data transformation logic. Fortunately, each of those are largely broken up into microservices, so they can be replaced as needed. If we decide to port, we can at least do it a little at a time.

    The real question is, does the team want to maintain a Python or Rust app, and since almost nobody on the team has professional experience with low-level languages and our load is pretty small (a few thousand users; b2b), Python is preferred.


  • Yup, I guess not. But if I was on the product team, the customers and director ate the bosses. And on it goes up to the CEO, where the board and shareholders are the boss.

    If I can justify the change, we’ll do it. That’s close enough for me. And I did do a POC w/ Rust and could’ve switched one service over, but I campaigned against myself since we got good enough perf w/ Python (numpy + numba) and I was the only one who wanted it. That has changed, so I might try again with another service (prob our gateway, we have 3 and they all kinda suck).

    I’ll have to check out Deno again. I remember looking at it (or something like it) a couple years ago when first announced on Reddit.


  • Well, I’m kind of the boss, but I inherited the Python codebase. The original reasoning was it’s easier to hire/on-board people, which I think is largely true.

    If it was up to me, I’d rewrite a bunch of our code to Rust. I use it for personal projects already, so I know the ecosystem. But that’s a tough sale to the product team, so it’s probably not happening anytime soon. I’d also have to retrain everyone, which doesn’t sound fun…

    However, any change I make needs to work smoothly for our devs, and we have a few teams across 3 regions. So it needs clear advantages and whatnot to go through the pain of addressing everyone’s concerns.


  • That’s pretty impressive! We have a bunch of a bunch of compiled stuff (numpy, tensorflow, etc), so I’m guessing we wouldn’t see as dramatic of an improvement.

    Then again, <1 min is “good enough” for me, certainly good enough to not warrant a rewrite. But I’ll have to try uv out, maybe we’ll switch to it. We switched from requirements.txt -> pyproject.toml using poetry, so maybe it’s worth trying out the improved pyproject.toml support. Our microservices each take ~30s to install (I think w/o cache?), which isn’t terrible and it’s a relatively insignificant part of our build pipelines, but rebuilding everything from scratch when we upgrade Python is a pain.


  • Both of those are largely bound by i/o, but with some processing in between, so the best way to speed things up is probably am async i/o loop that feeds a worker pool. In Python, you’d use processes, which can be expensive and a little complicated, but workable.

    And as you pointed out, scons and pip exist, and they’re fast enough. I actually use poetry, and it’s completely fine.

    You could go all out and build something like cargo, but it’s the architecture decisions that matter most in something i/o bound like that.




  • Exactly. We have hundreds of thousands of lines of code that work reasonably well. I think we made the important decisions correctly, so performance issues in one area rarely impact others.

    We rewrote ~1k lines of poorly running Fortran code into well-written Python code, and that worked because we got the important parts right (reduced big-O CPU from O(n3) to O(n2 log n) and memory from O(n4) to O(n3)). Runtime went from minutes to seconds in medium size data sets, and made large data sets possible to run (those would OOM due to O(n4) storage in RAM). If you get the important parts right, Python is probably good enough, and you can get linear optimizations from there by moving parts to a compiled language (or use a JIT like numba). Python wasn’t why we could make it fast, it’s just what we prototyped with so we could focus on the architecture, and we stopped optimizing when it was fast enough.






  • That’s why you need an architect to design the project for the expected requirements. They’ll ask the important questions, like:

    • how many users (if it’s a server)
    • any long-running processes? If so, what will those be doing?
    • how responsive does it need to be? What’s “fast enough”?
    • what’s more important short term, feature delivery or performance? Long term? How far away is “long term”?
    • what platforms do we need to support?
    • is AI or similar in the medium term goals? What’s the use case?
    • how bad is downtime? How much are you willing to spend if downtime isn’t an option?

    You don’t need all the answers up front, but you need enough to design a coherent system. It’s like building a rail system, building a commuter line is much different than a light rail network, and the planners will need to know if those systems need to interact with anything else.

    If you don’t do that, you’re going to end up overspending in some area, and probably significantly.


  • I have an alternative anecdote.

    My coworker is a Ph.D in something domain specific, and he wrote an app to do some complex simulation. The simulation worked well on small inputs (like 10), but took minutes on larger inputs (~100), and we want to support very large inputs (1000+) but the program would get killed with out of memory errors.

    I (CS background) looked over the code and pointed out two issues:

    • bubble sort in a hot path
    • allocated all working memory at the start and used 4D arrays, when 3D arrays and a 1D result array would’ve sufficed (O(n4) -> O(n3))

    Both problems would have been solved had they used Python, but they used Fortran because “fast,” but it doesn’t have builtin sort or data structures. Python provides classes, sortable lists (with quicksort!), etc, so they could’ve structured their code better and avoided the architectural mistakes that caused runtime and memory to explode. Had they done that, I could’ve solved performance problems by switching lists to numpy arrays and throwing numba on the hot loops and been done in a day, but instead we spent weeks rewriting it (nobody understands Fortran, and that apparently included the original dev).

    Python lets you focus on the architecture. Compiled languages often get you stuck in the weeds, especially if you don’t have a strong CS background and just hack things until it works.


  • Python is also pretty good for production, provided you’re using libraries optimized in something faster. Is there a reason you didn’t use Open3D’s Python library? I’m guessing you’d get close to the same performance of the C++ code in a lot less time.

    That said, if you’re doing an animation in 3D, you should probably consider a game engine. Godot w/ GDScript would probably be good enough, though you’d spend a few days learning the engine (but the next project would be way faster).

    If you’re writing a performance-critical library, something compiled is often the better choice. If you’re just plugging libraries together, something like Python is probably a better use of your time since the vast majority of CPU time can generally be done in a library.




  • We also do “real work,” and that uses libraries that use C(++) under the hood, like scipy, numpy, and tensorflow. We do simulations of seismic waves, particle physics simulations, etc. Most of our app is business logic in a webapp, but there’s heavy lifting as well. All of “our” code is Python. I even pitched using Rust for a project, but we were able to get the Python code “fast enough” with numba.

    We separate expensive logic that can take longer into background tasks from requests that need to finish quickly. We auto-scale horizontally as needed so everything remains responsive.

    That’s what I mean by “architected well,” everything stays responsive and we just increase our hosting costs instead of development costs. If we need to, we could always rewrite parts in a faster language, provided that costs less than the development costs. We really don’t spend much time at all optimizing python code, so we’re not at that point yet.

    That being said, I do appreciate faster-running code. I use Rust for most of my personal projects, but that’s because I don’t have to pay a team to maintain my projects.