Studies of Windows NT performance using dynamic execution traces

Abstract
We studied two aspects of the performance of Win- dows NT : processor bandwidth requirements for memory accesses in a uniprocessor system running com- mercial and benchmark applications, and locking behav- ior of a commercial database on a small-scale multipro- cessor. Our studies are based on full dynamic execution traces of the systems, which include all instructions exe- cuted by the operating system and applications over peri- ods of a few seconds (enough time to allow for significant computation). The traces were obtained on Alpha PCs, using a new software tool called PatchWrx that takes ad- vantage of the Alpha architecture's PAL-code layer to implement efficient, comprehensive system tracing. Be- cause the Alpha version of Windows NT uses substan- tially the same code base as other versions, and therefore executes nearly the same sequence of calls, basic blocks, and data structure accesses, we believe our conclusions are relevant for non-Alpha systems as well. This paper describes our performance studies and interesting aspects of PatchWrx. We conclude from our studies that processor band- width can be a first-order bottleneck to achieving good performance. This is particularly apparent when study- ing commercial benchmarks. Operating system code and data structures contribute disproportionately to the memory access load. We also found that operating sys- tem software lock contention was a factor preventing the database benchmark from scaling up on the small mul- tiprocessor, and that the cache coherence protocol em- ployed by the machine introduced more cache interfer- ence than necessary.