Much has been written about the fact that the speed of individual CPU cores isn’t increasing at the rate it did from 1980 through 2004 or so. Instead, chip makers are now turning to multi-core designs to boost performance. (See this brief primer from Jason Snell at Macworld.) Thus a lot of people have been asking whether Photoshop takes advantage of these new systems. The short answer is yes, Photoshop has included optimizations for multi-processor machines (of which multi-core systems are a type) for many years.
What may not be obvious to a non-engineer like me, however, is that not all operations can or should be split among multiple cores, as doing so can actually make them slower. Because memory bandwidth hasn’t kept pace with CPU speed (see Scott Byer’s 64-bit article for more info), the cost of moving data to and from each CPU can be significant. To borrow a factory metaphor from Photoshop co-architect Russell Williams, "The workers run out of materials & end up standing around." The memory bottleneck means that multi-core can’t make everything faster, and we’ll need to think about doing new kinds of processing specifically geared towards heavy computing/low memory usage.
Because Russell has forgotten more than I will ever know about this stuff, I’ve asked him to share some info and insights in the extended entry. Read on for more.
Intel-based architectures don’t necessarily add memory bandwidth as they add cores. A single CPU on a system with limited memory bandwidth can often
saturate the memory bandwidth if it just moves a big chunk of memory from here to there. It even has time to do several arithmetic operations in between and still saturate the memory. If your system is bandwidth-limited and the operation you want to do involves moving a big chunk of data (bigger than the caches) from here to there while doing a limited number of arithmetic operations on it, adding cores cannot speed it up no matter how clever the software is. Many Photoshop operations are in this category, for instance.
AMD’s architecture adds memory bandwidth as you add CPU chips, but taking advantage of it can be dependent on placement of the data into different areas of physical RAM attached to the different chips. It doesn’t do any good if all your data gets put into one of the memory banks — then you’re right back where you started. So, the memory system and how it’s used will have a big effect on how many
things speed up when you add more cores to a computer.
The other issue is Amdahl’s Law, described by computer architect Gene Amdahl in the 1960s. Almost all algorithms that can be parallelized also have some portion that must be done sequentially — setup (deciding how to divide the problem up among multiple cores) or synchronization, or collecting and summarizing the results. At those times each step depends on the step before being completed. As you add processors and speed up the parallel part, the sequential part inevitably takes up a larger percentage of the time. If 10% of the problem is sequential, then even if you add an infinite number of processors and get the other 90% of the problem done in zero time, you can achieve at most a 10X speedup. And some algorithms are just really hard or impossible to parallelize: calculating text layout on a page is a commonly cited example.
These two basic issues are why the giant massively parallel machines have RAM attached to each node and are used to solve only a small set of specially selected, specially coded problems — usually ones where the parallel part of the problem itself has been scaled up to enormous sizes. As the number of cores goes up, the likelihood that a particular problem will hit one of the above limits goes up.
Why does video rendering scale better than Photoshop? Rendering video is typically done by taking some source image material for a frame and performing a stack of adjustments and filters on it. Each frame is only a few hundred thousand pixels (for standard definition) or at most 2 megapixels or 8MB in 8-bit (for HD). Thus, particularly for standard definition images, the cache gives a lot more benefit as a sequence of operations are performed on each frame, and for each frame, you fetch the data, do several operations, and write the final result. Different frames can usually be rendered in parallel – one per processor, and so each processor does a fair chunk of computation for each byte read or written from memory.
By contrast, in Photoshop most time-consuming operations are performed on a single image layer and the problem is the size of that layer — 30MB for an 8-bit image from a 10MP digital camera. 60MB if you keep all the information by converting the raw file to 16 bit. Or if you’ve merged some Canon 1DSMkII images to HDR, that’s over 200MB. And of course the people most concerned with speeding up Photoshop with more cores are the ones with the giant images. When you run a Gaussian Blur on that giant image, the processor has to read all of it from memory, perform a relatively few calculations, and then write the result into newly allocated memory (so you can undo it). You can work on different pieces of the image on different processors, but you’re not doing nearly as much computation on each byte fetched from memory as in the video case. The operations that scale best in Photoshop are those that:
- Do a lot of computation for each pixel fetched. Shadow/Highlight correction is an example of an operation that has to do a lot of computation on each byte fetched, while normal blending does very little. A giant-radius blur is an example of the opposite extreme: lots of pixels have to be fetched to do a simple computation and produce one output pixel.
- Do pixel-based operations that take advantage of Photoshop’s framework for parallel computation. Most filters and adjustments fall into this category. But
many text tool operations and the solution of partial differential equations
required for the healing brush are examples of things that don’t fit this
framework..
To take good advantage of 8- or 16- core machines (for things other than servers), we’ll need machines whose bandwidth increases with the number of cores, and we’ll need problems that depend on doing relatively large amounts of computation for each byte fetched from main memory (yes, re-reading the same data you’ve already fetched into the caches counts). Complex video and audio signal processing are good examples of these kinds of tasks. And we’re always looking for more useful things that Photoshop can do that are more computationally intensive.
— Russell Williams
John, Russell, thanks for all of the explanations, but I wonder if you can provide specific examples of machine configurations that provide the best performance for Photoshop?
Which CPU, how much memory, which disk drive configurations for both PC and MAC?? Thanks!
[Jerry, I’d start with the overviews in the support pages of Adobe.com; check out the performance optimization guides for Windows and Mac. You can also Google “optimizing Photoshop” to get more leads. –J.]
> Intel-based architectures don’t necessarily add memory bandwidth as they add cores.
Is this the issue that dogged Intel’s early Dual-cores and got addressed by the DUO architecture?
[I don’t know, but I’ll pass the question to Russell & Scott. –J.]
Thanks for this peek behind the curtain.
One question – is the essential message here that an 8-core Intel machine offers only a modest speed bump (if any) over a 4-core Intel machine when being used for Photoshop work? For example – Shadow/Highlight would benefit, but not Gaussian Blur?
[Let me ping Russell for info. –J.]
John
Thanks so much for your ongoing insights into things Adobe, et al.
[My pleasure; thanks for checking ’em out. –J.]
Question: does CS3 utilize the “Core Image” of Mac OSX and the GPU of the computer video card (especially in turn of the century G4 Macs)?
[No, Photoshop does not use Core Image. –J.]
Happy New Year
[Thanks, and same to you! –J.]
This is very timely. We had a discussion on our production floor about this and we were split evenly.
The optimization guides are very helpful in a mixed environment.
I want to thank you for the this post. I was unaware of the blog until cnet pointed to it.
Thanks again.
[Cool–glad to hear it. –J.]
No, the Core Duo chips do nothing to solve the bandwidth problems.
Unfortunately, when CPU makers cite speedups they’re talking about extremely compute limited operations (ray tracing, numerical simulation, etc.). They don’t tell you that you’ve got 4 times the compute capability but still have the same size data pipe as the single core chip. (leaving the extra cores starved for data most of the time).
If I understand what you are saying, then would it be fair to sumarise as:
It is more important to have fast memory and a fast bus than having lots of processor cores?
[I’ll try to get a better/deeper answer on this, but my take is that it’s not a question of one being more important than the other. Rather, it’s kind of like a factory: you have to get the materials in and out efficiently, and you have to process them efficiently. Relative slowness in either one can be a bottleneck. As the ratios between the two change, we need to think of new ways to tune the app to take advantage of the new systems. Simply doing the same operations likely won’t be good enough. –J.]
Interesting article. I did not realize the healing brush used 4th-order partial differential equations to reconstruct textures, that’s pretty impressive.
Memory bandwidth is the achilles heel of Intel processors. They still haven’t switched from the slow shared memory bus, and sort of paper over the problem by using massive amounts of cache to compensate. In comparison the switched architecture used by either the G5 or AMD Opterons and Athlon64s is far superior, but you still need to have an OS that gives you fine-grained control over processor affinity, and few desktop OSes if any do so. The benchmark that allows you to measure memory performance is the STREAM benchmark, which is included in Xbench.
I do “normal” photoshop editing on large landscape files. If I get a new dual core PC… what advantage would I see adding a second dual core processor? Or even more simply… do people see a large speed increase with the first dual core over a single core?
Thanks, Alan
[A variety of operations will be sped up, but I don’t have a list off the top of my head. I’d suggest seeking out Photoshop benchmarks on representative systems (e.g. on Ars Technica, Macworld, etc.). Keep in mind that other components–RAM and scratch disk speed in particular–also affect performance. –J.]
So are we understanding correctly that as landscape photographers for Apple MacPros:
1) their entry level video will perform for 2D PS CS3 as well as the others?
2) the new 8core machine will NOT provide any significant real-world performance advantages for PS CS3?
3) or would 8core help with Adobe RAW 4.0 and processing in 16bit images plus the handful of filters: Photozoom, Shadows/Hilights, Noise Ninja , Nik Sharpener Pro).
Searching the usual sites show no CS3 to CS3 benchmarks on MacPros (eg barefeats.com, xlr8yourmac.com, etc.)
[I just saw the news about the new systems, and I don’t know what CS3 benchmarks exist. I’ll look for more info. –J.]
A little confused after reading the support pages of Adobe.com. Specifically:
Processor speed
“[…]Photoshop requires a G3 or faster processor. Photoshop can also take advantage of multiprocessor systems (that is, systems that have two or more PowerPC processors), which are much faster than a single-processor system. All Photoshop features are faster on a multiprocessor system, and some features are much faster.”
link
[Yeah–that seems a little overstated. I’ll try to get more info to clarify. The reality, as I understand it, is that some operations wouldn’t make sense to split up across multiple cores, so Photoshop doesn’t do it. That’s why if PS is doing something and not pegging both of your processors, it’s not true that the app is running badly & failing to take advantage of your hardware. –J.]
WRT multi-processors, the new MacPro in particular… wouldn’t you say that even though Photoshop isn’t necessarily taking advantage of all those cores in the same way that video rendering does… that if you yourself are a multitasker the 8 cores are much better for you than 4 cores. If you’re burning a DVD, downloading some music, and Photoshopping at the same time for example.
[Yes, I think that’s probably reasonable. It’s similar to what I tell people about RAM: although Photoshop does not address more than 4GB of RAM directly, it’s still useful to have more RAM available for the OS and other apps, not to mention for PS to address through OS caching. On my laptop right now, Safair is eating up ~500MB (!) of real memory and 1.7GB (!!) of virtual memory. So, having extra resources around is generally a Good Thing. –J.]
I use an ‘old’ laptop – ‘with and AMD Semperon CPU and went from 1 to 2 gigs of RAM without any noticeable difference. The CPU seems to strain & run hot regardless. I want to upgrade for photo-editing on a budget. Given equal RAM, is there an appreciable difference between the AMD Athlon X 2, AMD Turion 64 X2, Intel Pentium Duo Core and the Intel Core 2 Duo?
really useful info! I’d love to have your opinion on something specific. I am a freelance illustrator and photoshop is a tool I use on a daily basis. Thing is, 95% of my work in photoshop is with the brush (custom brushes, brush parameters etc.)
since I am mostly painting. (oh and lots of layers too.)I am working usually with A3-300dpi files or something close to it.
I recently decided to upgrade my hardware and after checking out the new mac pros I realized there isn’t a big difference in price between quad 2.6 and 8-core 2.2
Considering my main use for photoshop what would you recommend?
Does the brush engine benefits from higher CPU speeds or multi cores?
and if not now, is it possible that it will in future versions of PS?
Searching the net high and low for Photoshop performance and what makes Photoshop tick?
So, what makes Photoshop tick in 2010 on both windows and mac based computers.
On the CPU side, is it more Cores or higher Ghz? On the GPU side of things we see a strong advertising from Adobe to promote nVidia’s CX Quadro cards for their CUDA performance, but will games cards (GT-295) give you the same or better performance for less $? Is ATI/AMD even a contender in this?
Where is the bottleneck in the system, be it Mac or PC, could it be the hard drives, and if so will the SSD drives be the answer for this? Reading Joseph Holmes notes on this, there was a bit of a surprise if I read it right, given the smaller byte size used on the scratch disk, the SSD drives was able to take full advantage of their potential. But all this is now a few years old.
So where are we today, when we want to have the best tool to work on, when Photoshop is the tool of choice.
Sorry to wake up this old thread 🙂 but I would love hear if someone has the answer for this.
I hope this will be interesting for everyone involved.
Thanks for taking the time
Henrik Tived