For 4x superresolution, use UltraSharpV2.
I evaluated the following 4x superresolution methods on 90 images I had lying around: Real-ESRGAN , Remacri , 4x UltraSharpV2 , LDSR , StableSR . Of these methods, I think 4x UltraSharpV2 is the best.
I evaluated the following 4x superresolution methods on 90 images I had lying around:
There are constantly new solutions for superresolution being pushed out and I would like to just use the best one, but unfortunately these new solutions rarely come with good showcases and I also don't know about good third-party reviews. Most of the people just publish 2 or 3 images together with their products, as if that means anything.
For me to make a decision, I need comparisons with competing techniques (including the base case, a non-intelligent simple superresolution algorithm), based on a large variety of images, both examples presenting the new technique in the best possible light as well as failure cases.
I chose 90 images that I had lying around and applied the following algorithms to them to produce 4x the input resolution (both width & height, i.e. 16x the number of pixels):
I didn't test SeeSR , CCSR , SUPIR , because they were too annoying to install or have too high hardware requirements for me or just didn't advertise their abilities very well.
I looked at the resulting images very closely and cut out one part (or sometimes multiple parts) from each image for comparison. This part is zoomed in to make comparison easier for you, which of course means the rest of the image isn't visible. I did this just for comparison, the superresolution methods were run on the uncropped image. If you want to compare the images in their entirety, you can download all results . I'm not the copyright holder for the images, so I can't just post a download link to the original hires images, but if you want to make your own comparison, contact me and I might be able to make the originals available to you. And be aware: Now that this comparison is published, future superresolution techniques can not be compared using the same images, because it is possible to tune these new techniques such that they excel on this benchmark while being mediocre for everything else.
Careful, the linked image is humongous. But if you want to see the differences clearly, you have to look at it. This preview loses too much detail.
Careful, the linked image is humongous. But if you want to see the differences clearly, you have to look at it. This preview loses too much detail.
Based on these images, I think 4x superresolution UltraSharpV2 is the best of the tested methods.
Based on these images, you will probably understand that I think 4x superresolution UltraSharpV2 is the best of the tested methods. None of these images are NSFW, but I tested it on some NSFW images as well and can say that UltraSharpV2 beats the other methods in that area as well. If you don't see how that's relevant: ANN-based methods only excel on the data they have been trained on and thus you can't expect them to produce good results on data they have not been trained on. And unfortunately, you just can't generalize from a small subset of all training data to the rest (e.g. housecats have vertical pupils, tigers have round pupils, even though both are cats). Also, different datasets push towards different preferences, because the data are incompatible. A good example is real-world photos compared to manga/anime/cartoons: Drawn images want very sharp, regular, high contrast lines surrounding mostly flat areas, but photos rarely contain such features. An ANN thus has to make the decision whether to reconstruct some detail as photorealistic or as a drawing. Consequently, showing you images from just one subset (SFW in this case) does not allow you to draw conclusions about the output quality for other input image subsets.
Some thoughts about the different methods: The heavy-weight/slow methods LDSR/StableSR do not seem to outperform the fast methods except for very few examples and underperform significantly for many, which makes the extra effort questionable. They often hallucinate small details like text, which is probably due to the multi-step nature. The StableSR implementation I used (ComfyUI) seems to have a bug which results in some parts of the output image not being processed at all (i.e. remaining at the low input resolution). I did not try to fix or work around that, because StableSR does not produce great results even in the parts of the image where it does work. LDSR seems to be unable to produce certain output resolutions exactly, probably because being limited to multiples of a base number (just like Stable Diffusion can only produce output dimensions divisible by 8 because its latent is 8x downsampled relative to the input resolution). Because of that, some of the LDSR crops are misaligned. I have not tried to hide that, because a limitation like that is relevant for a comparison. Maybe it's the fault of the LDSR implementation I used. The other methods do not seem to have that limitation.
Real-ESRGAN erases or blurs a lot of detail (like the veins in the dragonfly wing). Remacri does not do that, but it goes too far in the other direction and adds too many hallucinations and exaggerated features (e.g. the penguin colony and the hair on the drinking monkey's head). That UltraSharpV2 strikes a good balance between these extremes is the main reason why I consider it the winner. It is also clear that even the best method often doesn't add much beyond edge enhancement/sharpening relative to the Lanczos output, so if you need a really fast and memory-efficient solution, don't feel bad for using Lanczos + Sharpening (not among the tested methods). And finally it should be clear that even the best method does not come close to the ground truth for most inputs, and so downsampling followed by superresolution can not be a replacement for the original. It should not be surprising. Some small details are unreconstructible, because you would have to guess. Examples:
If you want to run the winner at home interactively, I would recommend ComfyUI .
If you want to run the winner at home interactively, I would recommend ComfyUI .
In short: Low resolution image in, high resolution image out. That means the algorithm has to guess detail that isn't actually there in the source image. There are two very different approaches: Either try to make a high resolution image that is representative of all high resolution images that correspond to the low resolution input and that does not try to create details that you cannot actually know without having access to the probability distribution of all real world high resolution images. The other approach does try to create such unknowable detail so as to make the final image more realistic, at the cost of not perfectly representing all high resolution images corresponding to the input. The second approach essentially chooses one high resolution image among an infinitude of possible choices, the first approach instead tries to average all possible choices. The first approach would not produce a realistic human face when trying to 4x or 8x upscale a very low resolution (something like 8x8 pixels) face, but the second approach would attempt that, as long as it can recognize the face as a face.
Written by the author; Date 12.02.2026; © 2026 spinningsphinx.com