basnijholt's comments

basnijholt · 2025-03-25T06:43:32 1742885012

I built dotbins, a Python-based tool designed to simplify managing and version-controlling pre-compiled CLI binaries directly within your dotfiles repository.

See this repository of my personal setup https://github.com/basnijholt/.dotbins (repo fully created by the dotbins tool!)

Why dotbins?

Cross-platform: Effortlessly manages CLI tools on macOS, Linux, and Windows.

No Admin Rights Needed: Perfect for restricted environments or quick setups.

Auto-download & Update: Automatically fetches and updates CLI binaries from GitHub releases.

Git Integration: Enables seamless synchronization of CLI tools alongside your configurations.

Example workflow:

Quickly install a tool directly from GitHub: `dotbins get junegunn/fzf`

Sync all CLI tools defined in a simple YAML config: `dotbins sync`

I created dotbins out of personal frustration managing tools across different systems. It's streamlined my setup process significantly—I hope it helps you too.

The project is open-source, and feedback or contributions are welcome!

Check it out here: https://github.com/basnijholt/dotbins

basnijholt · on Feb 9, 2025

I wouldn’t say it is better, just different trade offs. Using this tool you can see/restore all files in normal folder structures without requiring the tool itself.

basnijholt · on Feb 8, 2025

Hi Hacker News, I'm excited to share my project, where I took on the challenge of porting a popular but untested 600+ line Bash script to Python. The outcome is rsync-time-machine.py (https://github.com/basnijholt/rsync-time-machine.py), a Python implementation of the rsync-time-backup (https://github.com/laurent22/rsync-time-backup) script. It provides Time Machine-style backups using rsync and creates incremental backups of files and directories to the destination of your choice.

The tool is designed to work on Linux, macOS, and Windows (via WSL or Cygwin). Its advantage over Time Machine is its flexibility - it can backup from/to any filesystem and works on any platform. You can also backup to a Truecrypt drive without any issues.

Unlike the original Bash script, rsync-time-machine.py is fully tested. It has no external dependencies (only requires Python ≥3.7), and it is fully compatible with rsync-time-backup (https://github.com/laurent22/rsync-time-backup). It offers pretty terminal output and is fully typed.

Key features include:

* Each backup is in its own folder named after the current timestamp.

* Backup to/from remote destinations over SSH.

* Files that haven't changed from one backup to the next are hard-linked to the previous backup, saving space.

* Safety check - the backup will only happen if the destination has explicitly been marked as a backup destination.

* Resume feature - if a backup has failed or was interrupted, the tool will resume from there on the next backup.

* Exclude file - support for pattern-based exclusion via the --exclude-from rsync parameter.

* Automatically purge old backups based on a configurable expiration strategy.

* "latest" symlink that points to the latest successful backup.

To learn more about how to use and install rsync-time-machine.py, check out the GitHub repo (https://github.com/basnijholt/rsync-time-machine.py).

I appreciate any feedback and contributions! Feel free to file an issue on the GitHub repository for any bugs, suggestions, or improvements. Looking forward to hearing your thoughts.

Happy backing up!

Please, do let me know if you have any questions or need any further information.

basnijholt · on Dec 22, 2024

I've developed PipeFunc, a new Python library designed to simplify the creation and execution of DAG-based computational pipelines, specifically targeting scientific computing and data analysis workflows. It's built for speed and ease of use, with a focus on minimizing boilerplate and maximizing performance.

Key features:

• Automatic Dependency Resolution: PipeFunc automatically determines the execution order of functions based on their dependencies, eliminating the need for manual dependency management. You define the relationships, and PipeFunc figures out the order.

• Ultra-Low Overhead: The library introduces minimal overhead, measured at around 15µs per function call. This makes it suitable for performance-critical applications.

• Effortless Parallelism: PipeFunc automatically parallelizes independent tasks, and it's compatible with any `concurrent.futures.Executor`. This allows you to easily leverage multi-core processors or even distribute computation across a cluster (e.g., using SLURM).

• Built-in Parameter Sweeps: The `mapspec` feature provides a concise way to define and execute N-dimensional parameter sweeps, which is often crucial in scientific experiments, simulations, and hyperparameter optimization. It uses an index-based approach to do this in parallel with minimal overhead.

• Advanced Caching: Multiple caching options helps avoid redundant computations, saving time and resources.

• Type Safety: PipeFunc leverages Python's type hinting to validate the consistency of data types across the pipeline, reducing the risk of runtime errors.

• Debugging Support: Includes an `ErrorSnapshot` feature that captures detailed error state information, including the function, arguments, traceback, and environment, to simplify debugging and error reproduction.

• Visualization: PipeFunc can generate visualizations of your pipeline to aid in understanding and debugging.

Comparison with existing tools:

• vs. Dask: PipeFunc provides a higher-level, declarative approach to pipeline construction. It automatically handles task scheduling and execution based on function definitions and `mapspec`s, whereas Dask requires more explicit task definition.

• vs. Luigi/Airflow/Prefect/Kedro: These tools are primarily designed for ETL and event-driven workflows. PipeFunc, in contrast, is optimized for scientific computing and computational workflows that require fine-grained control over execution, resource allocation, and parameter sweeps.

Use Cases:

• Scientific simulations and data analysis

• Machine learning pipelines (preprocessing, training, evaluation)

• High-performance computing (HPC) workflows

• Complex data processing tasks

• Any scenario involving interconnected functions where performance and ease of use are important

I'd appreciate any feedback, especially regarding performance, usability, and potential applications in different scientific domains.

Links:

Documentation: https://pipefunc.readthedocs.io

Source Code: https://github.com/pipefunc/pipefunc

basnijholt · on May 14, 2023

Actually, for raw speed, rsync is much faster than any of the tools you mentioned (see e.g., https://github.com/borgbackup/borg/issues/4190). I really like a lightweight solution, where I do not even need any tool to restore backups. The tools you mentioned are great though.

indianets · on May 15, 2023

Did you see the last reply on the thread you linked. The guy messed up ENV variable in borg and was doing too many account backups as new archives killing the cache when the same account was backed up next day. Borg will always be faster than rsync while doing incremental backups, but of course has a learning curve coming from the simplicity of rsync.

aborsy · on May 14, 2023

Those tools do a lot more: encryption, deduplication, compression. Even then, they have been faster in my experience, due to caching.

basnijholt · on May 14, 2023

I attempted doing this initially. Everything is very doable except SSH. One would have to rely on third-party libraries to do this well.

sgarland · on May 14, 2023

Are you open to a single dependency [0]? Entirely native tooling is an admirable thing that I greatly appreciate, but parsing subprocess output is fraught with issues (I know, I've done this as well).

[0]: https://github.com/ParallelSSH/ssh-python

memco · on May 15, 2023

I’ve used paramiko for this also: https://www.paramiko.org/.

basnijholt · on May 13, 2023

Hi Hacker News,

I'm excited to share my recent project, where I took on the challenge of porting a popular but untested 600+ line Bash script to Python. The outcome is [`rsync-time-machine.py`](https://github.com/basnijholt/rsync-time-machine.py), a Python implementation of the [`rsync-time-backup`](https://github.com/laurent22/rsync-time-backup) script. It provides Time Machine-style backups using rsync and creates incremental backups of files and directories to the destination of your choice.

The tool is designed to work on Linux, macOS, and Windows (via WSL or Cygwin). Its advantage over Time Machine is its flexibility - it can backup from/to any filesystem and works on any platform. You can also backup to a Truecrypt drive without any issues.

Unlike the original Bash script, `rsync-time-machine.py` is fully tested. It has no external dependencies (only requires Python ≥3.7), and it is fully compatible with [`rsync-time-backup`](https://github.com/laurent22/rsync-time-backup). It offers pretty terminal output and is fully typed.

Key features include:

* Each backup is in its own folder named after the current timestamp. * Backup to/from remote destinations over SSH. * Files that haven't changed from one backup to the next are hard-linked to the previous backup, saving space. * Safety check - the backup will only happen if the destination has explicitly been marked as a backup destination. * Resume feature - if a backup has failed or was interrupted, the tool will resume from there on the next backup. * Exclude file - support for pattern-based exclusion via the `--exclude-from` rsync parameter. * Automatically purge old backups based on a configurable expiration strategy. * "latest" symlink that points to the latest successful backup.

To learn more about how to use and install `rsync-time-machine.py`, check out the [GitHub repo](https://github.com/basnijholt/rsync-time-machine.py).

I appreciate any feedback and contributions! Feel free to file an issue on the GitHub repository for any bugs, suggestions, or improvements. Looking forward to hearing your thoughts.

Happy backing up!

Please, do let me know if you have any questions or need any further information.

pwg · on May 14, 2023

The description sounds like it does largely the same job as rsnapshot (https://rsnapshot.org/). What does yours do differently from rsnapshot?

Gys · on May 14, 2023

Rsnapshot needs perl? Not available by default on all systems?

sgarland · on May 14, 2023

This needs Python 3.7. Which systems have Python 3.7 by default, but don't include Perl?

pyuser583 · on May 14, 2023

Is rsync-time-machine.py a valid module name? Or does it not matter because it’s a script?

basnijholt · on April 11, 2023

Hey HN! I recently created a tool called markdown-code-runner (https://github.com/basnijholt/markdown-code-runner) that allows you to run code blocks in markdown files. It also supports hidden code blocks in Markdown comments, such that the code is hidden and only the output is shown. It supports multiple languages and can be used to test code snippets in documentation or tutorials. I think it's a great tool for developers and technical writers and I'm excited to share it with the community. Let me know what you think!