Linux Audio

Check our new training course

Loading...
Note: File does not exist in v6.13.7.
  1Debugging hibernation and suspend
  2	(C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
  3
  41. Testing hibernation (aka suspend to disk or STD)
  5
  6To check if hibernation works, you can try to hibernate in the "reboot" mode:
  7
  8# echo reboot > /sys/power/disk
  9# echo disk > /sys/power/state
 10
 11and the system should create a hibernation image, reboot, resume and get back to
 12the command prompt where you have started the transition.  If that happens,
 13hibernation is most likely to work correctly.  Still, you need to repeat the
 14test at least a couple of times in a row for confidence.  [This is necessary,
 15because some problems only show up on a second attempt at suspending and
 16resuming the system.]  Moreover, hibernating in the "reboot" and "shutdown"
 17modes causes the PM core to skip some platform-related callbacks which on ACPI
 18systems might be necessary to make hibernation work.  Thus, if your machine fails
 19to hibernate or resume in the "reboot" mode, you should try the "platform" mode:
 20
 21# echo platform > /sys/power/disk
 22# echo disk > /sys/power/state
 23
 24which is the default and recommended mode of hibernation.
 25
 26Unfortunately, the "platform" mode of hibernation does not work on some systems
 27with broken BIOSes.  In such cases the "shutdown" mode of hibernation might
 28work:
 29
 30# echo shutdown > /sys/power/disk
 31# echo disk > /sys/power/state
 32
 33(it is similar to the "reboot" mode, but it requires you to press the power
 34button to make the system resume).
 35
 36If neither "platform" nor "shutdown" hibernation mode works, you will need to
 37identify what goes wrong.
 38
 39a) Test modes of hibernation
 40
 41To find out why hibernation fails on your system, you can use a special testing
 42facility available if the kernel is compiled with CONFIG_PM_DEBUG set.  Then,
 43there is the file /sys/power/pm_test that can be used to make the hibernation
 44core run in a test mode.  There are 5 test modes available:
 45
 46freezer
 47- test the freezing of processes
 48
 49devices
 50- test the freezing of processes and suspending of devices
 51
 52platform
 53- test the freezing of processes, suspending of devices and platform
 54  global control methods(*)
 55
 56processors
 57- test the freezing of processes, suspending of devices, platform
 58  global control methods(*) and the disabling of nonboot CPUs
 59
 60core
 61- test the freezing of processes, suspending of devices, platform global
 62  control methods(*), the disabling of nonboot CPUs and suspending of
 63  platform/system devices
 64
 65(*) the platform global control methods are only available on ACPI systems
 66    and are only tested if the hibernation mode is set to "platform"
 67
 68To use one of them it is necessary to write the corresponding string to
 69/sys/power/pm_test (eg. "devices" to test the freezing of processes and
 70suspending devices) and issue the standard hibernation commands.  For example,
 71to use the "devices" test mode along with the "platform" mode of hibernation,
 72you should do the following:
 73
 74# echo devices > /sys/power/pm_test
 75# echo platform > /sys/power/disk
 76# echo disk > /sys/power/state
 77
 78Then, the kernel will try to freeze processes, suspend devices, wait a few
 79seconds (5 by default, but configurable by the suspend.pm_test_delay module
 80parameter), resume devices and thaw processes.  If "platform" is written to
 81/sys/power/pm_test , then after suspending devices the kernel will additionally
 82invoke the global control methods (eg. ACPI global control methods) used to
 83prepare the platform firmware for hibernation.  Next, it will wait a
 84configurable number of seconds and invoke the platform (eg. ACPI) global
 85methods used to cancel hibernation etc.
 86
 87Writing "none" to /sys/power/pm_test causes the kernel to switch to the normal
 88hibernation/suspend operations.  Also, when open for reading, /sys/power/pm_test
 89contains a space-separated list of all available tests (including "none" that
 90represents the normal functionality) in which the current test level is
 91indicated by square brackets.
 92
 93Generally, as you can see, each test level is more "invasive" than the previous
 94one and the "core" level tests the hardware and drivers as deeply as possible
 95without creating a hibernation image.  Obviously, if the "devices" test fails,
 96the "platform" test will fail as well and so on.  Thus, as a rule of thumb, you
 97should try the test modes starting from "freezer", through "devices", "platform"
 98and "processors" up to "core" (repeat the test on each level a couple of times
 99to make sure that any random factors are avoided).
100
101If the "freezer" test fails, there is a task that cannot be frozen (in that case
102it usually is possible to identify the offending task by analysing the output of
103dmesg obtained after the failing test).  Failure at this level usually means
104that there is a problem with the tasks freezer subsystem that should be
105reported.
106
107If the "devices" test fails, most likely there is a driver that cannot suspend
108or resume its device (in the latter case the system may hang or become unstable
109after the test, so please take that into consideration).  To find this driver,
110you can carry out a binary search according to the rules:
111- if the test fails, unload a half of the drivers currently loaded and repeat
112(that would probably involve rebooting the system, so always note what drivers
113have been loaded before the test),
114- if the test succeeds, load a half of the drivers you have unloaded most
115recently and repeat.
116
117Once you have found the failing driver (there can be more than just one of
118them), you have to unload it every time before hibernation.  In that case please
119make sure to report the problem with the driver.
120
121It is also possible that the "devices" test will still fail after you have
122unloaded all modules. In that case, you may want to look in your kernel
123configuration for the drivers that can be compiled as modules (and test again
124with these drivers compiled as modules).  You may also try to use some special
125kernel command line options such as "noapic", "noacpi" or even "acpi=off".
126
127If the "platform" test fails, there is a problem with the handling of the
128platform (eg. ACPI) firmware on your system.  In that case the "platform" mode
129of hibernation is not likely to work.  You can try the "shutdown" mode, but that
130is rather a poor man's workaround.
131
132If the "processors" test fails, the disabling/enabling of nonboot CPUs does not
133work (of course, this only may be an issue on SMP systems) and the problem
134should be reported.  In that case you can also try to switch the nonboot CPUs
135off and on using the /sys/devices/system/cpu/cpu*/online sysfs attributes and
136see if that works.
137
138If the "core" test fails, which means that suspending of the system/platform
139devices has failed (these devices are suspended on one CPU with interrupts off),
140the problem is most probably hardware-related and serious, so it should be
141reported.
142
143A failure of any of the "platform", "processors" or "core" tests may cause your
144system to hang or become unstable, so please beware.  Such a failure usually
145indicates a serious problem that very well may be related to the hardware, but
146please report it anyway.
147
148b) Testing minimal configuration
149
150If all of the hibernation test modes work, you can boot the system with the
151"init=/bin/bash" command line parameter and attempt to hibernate in the
152"reboot", "shutdown" and "platform" modes.  If that does not work, there
153probably is a problem with a driver statically compiled into the kernel and you
154can try to compile more drivers as modules, so that they can be tested
155individually.  Otherwise, there is a problem with a modular driver and you can
156find it by loading a half of the modules you normally use and binary searching
157in accordance with the algorithm:
158- if there are n modules loaded and the attempt to suspend and resume fails,
159unload n/2 of the modules and try again (that would probably involve rebooting
160the system),
161- if there are n modules loaded and the attempt to suspend and resume succeeds,
162load n/2 modules more and try again.
163
164Again, if you find the offending module(s), it(they) must be unloaded every time
165before hibernation, and please report the problem with it(them).
166
167c) Using the "test_resume" hibernation option
168
169/sys/power/disk generally tells the kernel what to do after creating a
170hibernation image.  One of the available options is "test_resume" which
171causes the just created image to be used for immediate restoration.  Namely,
172after doing:
173
174# echo test_resume > /sys/power/disk
175# echo disk > /sys/power/state
176
177a hibernation image will be created and a resume from it will be triggered
178immediately without involving the platform firmware in any way.
179
180That test can be used to check if failures to resume from hibernation are
181related to bad interactions with the platform firmware.  That is, if the above
182works every time, but resume from actual hibernation does not work or is
183unreliable, the platform firmware may be responsible for the failures.
184
185On architectures and platforms that support using different kernels to restore
186hibernation images (that is, the kernel used to read the image from storage and
187load it into memory is different from the one included in the image) or support
188kernel address space randomization, it also can be used to check if failures
189to resume may be related to the differences between the restore and image
190kernels.
191
192d) Advanced debugging
193
194In case that hibernation does not work on your system even in the minimal
195configuration and compiling more drivers as modules is not practical or some
196modules cannot be unloaded, you can use one of the more advanced debugging
197techniques to find the problem.  First, if there is a serial port in your box,
198you can boot the kernel with the 'no_console_suspend' parameter and try to log
199kernel messages using the serial console.  This may provide you with some
200information about the reasons of the suspend (resume) failure.  Alternatively,
201it may be possible to use a FireWire port for debugging with firescope
202(http://v3.sk/~lkundrak/firescope/).  On x86 it is also possible to
203use the PM_TRACE mechanism documented in Documentation/power/s2ram.txt .
204
2052. Testing suspend to RAM (STR)
206
207To verify that the STR works, it is generally more convenient to use the s2ram
208tool available from http://suspend.sf.net and documented at
209http://en.opensuse.org/SDB:Suspend_to_RAM (S2RAM_LINK).
210
211Namely, after writing "freezer", "devices", "platform", "processors", or "core"
212into /sys/power/pm_test (available if the kernel is compiled with
213CONFIG_PM_DEBUG set) the suspend code will work in the test mode corresponding
214to given string.  The STR test modes are defined in the same way as for
215hibernation, so please refer to Section 1 for more information about them.  In
216particular, the "core" test allows you to test everything except for the actual
217invocation of the platform firmware in order to put the system into the sleep
218state.
219
220Among other things, the testing with the help of /sys/power/pm_test may allow
221you to identify drivers that fail to suspend or resume their devices.  They
222should be unloaded every time before an STR transition.
223
224Next, you can follow the instructions at S2RAM_LINK to test the system, but if
225it does not work "out of the box", you may need to boot it with
226"init=/bin/bash" and test s2ram in the minimal configuration.  In that case,
227you may be able to search for failing drivers by following the procedure
228analogous to the one described in section 1.  If you find some failing drivers,
229you will have to unload them every time before an STR transition (ie. before
230you run s2ram), and please report the problems with them.
231
232There is a debugfs entry which shows the suspend to RAM statistics. Here is an
233example of its output.
234	# mount -t debugfs none /sys/kernel/debug
235	# cat /sys/kernel/debug/suspend_stats
236	success: 20
237	fail: 5
238	failed_freeze: 0
239	failed_prepare: 0
240	failed_suspend: 5
241	failed_suspend_noirq: 0
242	failed_resume: 0
243	failed_resume_noirq: 0
244	failures:
245	  last_failed_dev:	alarm
246				adc
247	  last_failed_errno:	-16
248				-16
249	  last_failed_step:	suspend
250				suspend
251Field success means the success number of suspend to RAM, and field fail means
252the failure number. Others are the failure number of different steps of suspend
253to RAM. suspend_stats just lists the last 2 failed devices, error number and
254failed step of suspend.