LINUX.ORG.RU

Ошибка с НМЖД

 ,


0

2

В /var/log/messages валятся сообщения такого вида

Apr 10 02:36:17 localhost kernel: [  645.016019] ata4: link is slow to respond, please be patient (ready=0)
Apr 10 02:36:22 localhost kernel: [  650.000019] ata4: device not ready (errno=-16), forcing hardreset
Apr 10 02:36:22 localhost kernel: [  650.000030] ata4: soft resetting link
Apr 10 02:36:23 localhost kernel: [  650.164938] ata4.00: configured for UDMA/133
Apr 10 02:36:23 localhost kernel: [  650.164965] ata4: EH complete

В dmesg вот такое

[   38.120047] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[   38.120102] ata4.00: BMDMA stat 0x24
[   38.120131] ata4.00: failed command: WRITE DMA
[   38.120168] ata4.00: cmd ca/00:08:80:d8:4b/00:00:00:00:00/ec tag 0 dma 4096 out
                        res 51/84:01:87:d8:4b/00:00:00:00:00/ec Emask 0x10 (ATA bus error)
[   38.120264] ata4.00: status: { DRDY ERR }
[   38.120292] ata4.00: error: { ICRC ABRT }
[   38.120330] ata4: soft resetting link
[   38.284838] ata4.00: configured for UDMA/133
[   38.284887] ata4: EH complete

Я бы и не заметил, но вылезло какое-то окно, что-то там бла-бла ошибка, бла-бла смарт, в течении суток повторим. Я на автомате прихлопнул его, а потом дошло что что-то не так, полез смотреть.

Куда смотреть? Вот что я сам увидел:

smartctl -H /dev/sda
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-1-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Please note the following marginal Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
190 Airflow_Temperature_Cel 0x0022   054   045   045    Old_age   Always   In_the_past 46

И что?



Последнее исправление: cetjs2 (всего исправлений: 2)

Ответ на: комментарий от anonymous
martctl -a /dev/sda
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-1-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar SE Serial ATA
Device Model:     WDC WD1600JS-00MHB0
Serial Number:    WD-WMANK1004452
Firmware Version: 02.01C03
User Capacity:    160 041 885 696 bytes [160 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7 (minor revision not indicated)
Local Time is:    Mon Apr 10 04:03:29 2017 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
					was suspended by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		( 6300) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  74) minutes.
Conveyance self-test routine
recommended polling time: 	 (   6) minutes.
SCT capabilities: 	       (0x103f)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0003   221   185   021    Pre-fail  Always       -       3916
  4 Start_Stop_Count        0x0032   096   096   000    Old_age   Always       -       4667
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   200   200   051    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   007   007   000    Old_age   Always       -       68540
 10 Spin_Retry_Count        0x0013   100   100   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0012   100   100   051    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   096   096   000    Old_age   Always       -       4585
190 Airflow_Temperature_Cel 0x0022   054   045   045    Old_age   Always   In_the_past 46
194 Temperature_Celsius     0x0022   104   095   000    Old_age   Always       -       46
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       79
200 Multi_Zone_Error_Rate   0x0009   200   200   051    Pre-fail  Offline      -       0

SMART Error Log Version: 1
ATA Error Count: 1
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 occurred at disk power-on lifetime: 3002 hours (125 days + 2 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 53 01 e0 4f c2 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d6 01 e0 4f c2 00 00      00:06:54.632  SMART WRITE LOG
  b0 d5 01 e0 4f c2 00 00      00:06:54.632  SMART READ LOG
  ec 00 01 00 00 00 00 00      00:06:54.539  IDENTIFY DEVICE
  ec 00 01 00 00 00 00 00      00:06:54.538  IDENTIFY DEVICE
  ca 00 08 18 db d1 0d 00      00:06:49.164  WRITE DMA

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
alex777
() автор топика
Ответ на: комментарий от alex777

Нагуглилось такое

smartctl --test=long /dev/sda
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-1-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 74 minutes for test to complete.
Test will complete after Mon Apr 10 06:23:51 2017

Use smartctl -X to abort test.

Потом посмотреть так?

smartctl -l selftest /dev/sda

alex777
() автор топика
Ответ на: комментарий от alex777

UDMA_CRC_Error_Count

Кабель замени.

Результат теста можно посмотреть так же в smartctl -a.

anonymous
()

Если бэдов нет - вангую хреновый кабель или закисшие контакты.

anonymous
()

Удваиваю анонимуса. Если UDMA_CRC_Error_Count растёт, то первым делом надо менять кабель.

Radjah ★★★★★
()
Ответ на: комментарий от AntonyRF

Диск не рука. Гугел давно публиковал свою статистику по хардам, самые живучие были при 40 градусах, ниже и выше - срок службы сокращался. Лень искать линк.

legolegs ★★★★★
()
Последнее исправление: legolegs (всего исправлений: 1)
Ответ на: комментарий от legolegs

Фиг его знает, сегодня перетряхнул системный блок, после нескольких часов активной работы диск на ощупь был горячий, хотя и терпимо. После перетряхивания ошибок больше пока нет.

Вообще странно все это, до этого полгода оно нормально работало, без ошибок и тут на ровном месте. Тесты смарт ни чего не показали, ну кроме того что диск старый, но это и так понятно.

alex777
() автор топика
Вы не можете добавлять комментарии в эту тему. Тема перемещена в архив.