Problem solve Get help with specific problems with your technologies, process and projects.

How to troubleshoot a server problem

Understand, communicate, monitor, check logs, ask for support. Follow these guidelines and make troubleshooting server problems quick and easy.

Troubleshooting servers is a fine art, but there are some methods and tips to get things running smoothly, quickly...

and easily.

ITIL methodology delves into how to troubleshoot a server or a related issue more deeply, but the general theme is to narrow down the problem as quickly and efficiently as possible.

Take a step back and think about how to logically resolve an issue during an outage. For example, if a user complains that they can't access something, find out if other users have the same issue, eliminating the possibility that the problem is localized to a single end-user device.

This generalist guide was designed to make you think about troubleshooting processes and procedures. Use it in concert with your own guidelines and technical strengths.

Is it widespread?

One of the first pieces of information you need is how widespread the outage or slowdown is, and what it's affecting. What may seem like a network issue could be a stepped-on cable, affecting one PC or small cluster.

If multiple users are afflicted with the same issue, it eliminates environment variables, such as software misuse or hardware problems on a local PC.

If you have multiple sites, are they all affected? This will help determine if the issue lies with a localized server.

Is it the server?

Members of a big IT team are used to finger-pointing between departments. The help desk hears about a slow application and the sysadmin blames the network; the network admin blames the storage area network (SAN); the storage admin blames the software. If you're troubleshooting an issue -- particularly if it's something vague like a slow application -- then identify what area of the data center infrastructure is affected. When multiple servers and applications are malfunctioning, this usually rules out a server issue and points to network or storage arrays. With virtualization, check the physical host location of any affected virtual machines to ensure they're not sharing the same, potentially compromised, hardware.

The process of elimination usually points to a clear culprit, but not always. Find commonality on issues, and try different combinations of factors to narrow down the possibilities. For example, perhaps the issue is that copying from one file share to another is taking too long. Is it slow if you copy from one server to another on the same site? If so, it's not the wide-area network. Is it slow if you copy between local disks on the server? If so, it's not the SAN or local area network. If you have to resort to packet capturing or input/output (I/O) speed tests, troubleshooting could take a long time.

Background

Documentation is an incredibly valuable troubleshooting tool. Easy access to your environment's topology and knowing how an application works on it enables you to quickly troubleshoot server issues.

Have a solid understanding of the data center operations, and ask yourself important questions: How many servers are involved with each application? What are the basic network settings? What infrastructure lives where? This proves valuable, for example, if you have two application servers that clients connect to via round robin DNS, and half of your users report issues. You know from the start that half the users connect to each server, so you won't waste time trying to solve a problem with the other server.

Communication

Communication is the key to troubleshooting server problems. For example, your colleague changed a server setting last night. The next day, something doesn't work. You need to know about the change, as it is a likely culprit. Large companies have change process forms so everyone is on the same page, but not every IT team has that luxury (or hindrance, depending on how you look at it).

Communication helps the data center team prepare and proactively watch the environment when a new application or other change goes into production. Otherwise you'll reactively ask about the new application, its deployment and resource demands when end users start to complain about it not working.

Monitoring

Save time troubleshooting server problems by having an ongoing overview of operations.

There are many monitoring tools available for different sizes and structures of data centers. When configured correctly, they track key metrics, such as latency and I/O speeds, which give you the ammunition to hassle the storage or network people as appropriate. Monitoring tools also alert you to potentially useful information, such as a drive with 1% free disk space that's primed to cause server issues.

Many products also monitor services, so if a critical service crashes and stops, the tool will send an alert or automatically attempt to restart it based on the rules you set.

Check the logs

Surprisingly, server and related logs are often overlooked.

When an issue comes up, technicians think they know what the issue is and spend hours trying to prove their theory. But if they spend a few minutes looking at the logs, they would see an exact cause of the problem recorded. Permission issues are easier to fix if you know what two things are trying to talk, and with what account, for example.

Check the Event Viewer logs on Microsoft Windows or syslogs on Unix/Linux servers for warnings and errors. Application logs are also worth reading, as they often contain error data that point you in the right direction of a root cause.

Support

Some admins consider calling a vendor and logging a ticket a defeat, but don't. After checking the basics, spend a few minutes logging a call, rather than waiting until you're several hours into an outage situation.

Take the time when things are running smoothly to check what your support service-level agreement is with major data center vendors. If the  vendor won't contact you until the next working day, logging the problem as early as possible can stave off a frustrating night.

Many vendors have specific instructions available online on how to troubleshoot server issues. Check resources from the vendor's knowledgebase and online forums.

It can be frustrating when you can't troubleshoot server problems and fix the issue within the first five minutes, but don't be afraid to ask for help. Preparation, communication and understanding your environment are the tools of a hero who saves the day.

Next Steps

Keep server hardware in shape with regular maintenance

Write strong server documentation

This was last published in December 2014

Dig Deeper on Real-Time Performance Monitoring and Management

PRO+

Content

Find more PRO+ content and other member only offers, here.

Join the conversation

3 comments

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

What's your best server troubleshooting tip for less-experienced administrators?
Cancel
When the server goes down, you hear about it, and you want to make sure that it comes back up again as soon as you can get it running. First, don't panic. If necessary, shut off your phone for a while. Answering calls isn't going to get the server up again. Most importantly, however, check the physical layer. Sometimes the server is suffering from remote access issues or worse, but always make sure that everything is plugged in securely. You don't want to spend hours tearing it apart only to realize that everything can be fixed by jiggling a cord.
Cancel
My Dell Server was suddenly stopped taking ISP(uplink) from my firewall and not able to ping, andit was showing error message as Hardrive 1 fault check and clean please let me know what was the problem on my server.
Cancel

-ADS BY GOOGLE

SearchDataCenter

SearchAWS

SearchServerVirtualization

SearchCloudApplications

SearchCloudComputing

DevOpsAgenda

Close