My Take On WannaCry

Reading media coverage of the WannaCry, ransomware attack has been excruciatingly frustrating because little to no information was offered on how infection happens and how to protect yourself.

This issue has been a bit frustrating and unhelpful as an IT professional and user if I didn’t find the right answers there is something seriously wrong.  I couldn’t find the important information in any of the mainstream articles so certainly a novice or amateur user would have no chance of protecting themselves.

How Did WannaCry Infect and Spread?

Long version here from Malwarebytes

One of the key ways is still the oldest “phishing” trick in the book, via e-mail which many users are tricked into opening infected attachments.  This was not readily available in media coverage and this simple warning or announcement could have prevented a lot of new infections.  I believe this is a key factor that has not been discussed since many networks will be behind NAT and external SMB services would be blocked, having users on the LAN install the worm is an easy way to get inside and spread the infection to areas that are hardened on the outside.

The more technical explanation there is an exploit called “ETERNALBLUE” which was a hacking tool leaked from the NSA which exploited a weakness in Microsoft’s implementation of SMB (Server Message Block/filesharing protocol).   This has been widely reported but the simple way to prevent automatic infection through this method has not.

Once infected the worm essentially scans your LAN and then the internet to spread the infection further which quickly multiplied the damage and scope of this attack.

How to protect yourself?

  1. First and foremost is to update your Microsoft Windows regardless of OS (whether you have XP, Vista, 7, 10, 12 or any Server) because all Microsoft versions are apparently impacted by MS17-010 ETERNALBLUE/WannaCry
  2. Disable SMB/Filesharing in Windows and if that is not possible at least use firewall settings to block SMB/filesharing/CIFS.
  3. If the above is not possible you should physically unplug any impacted machines from the network (it could be a simple as disabling all ports on your network/switch or even unplugging entire switches if possible).

Who is to blame?

There is plenty of blame to go around but currently a lot of it is coming from Microsoft who is blaming users for not patching and the NSA for hoarding these exploits and not notifying them or users beforehand.

In all fairness Microsoft did issue patches for even unsupported OS’s like Vista and XP on March 14th, 2017.

Many have mused that the NSA should have at last notified Microsoft the moment they realized their hacking tools were leaked.

At the end of the day the question is how could Microsoft have left open such a serious vulnerability for so long?  Was it an intentional backdoor and was it collaboration between Microsoft and the NSA or other third parties?

Some Can’t Patch

Some systems may be running on internal networks on their own LAN but were still infected so they wouldn’t be patched.  To make matters worse the chances are these would more likely be critical data and infrastructure that are impacted in this case.

Other machines are not managed properly or remotely and are deployed with internet access making them sitting ducks for these types of attacks.

There are also some who just don’t patch because the risk to impacting existing services is too great.  Although I would argue the risk is much higher to not patch and not upgrade or migrate your applications to a more secure platform if you get hit with ransomware like this.

These Issues Are Nothing New

With the Snowden revelations many have worried that US tech companies being forced to provide backdoor access to the NSA would be vulnerable should other hackers discovery the vulnerabilities or intentional backdoors on their own, or in this case when the tools and exploits were somehow leaked.

In the wider scope of things Microsoft has seen worms of this scale in the past, it’s nothing new.  There are no worldwide protocols for notifying users or defending against such worms and this will certainly become an increasingly problem with more and more devices online especially with IoT and so many devices that are connected that we don’t think about, and that don’t get patched or may not have an easy or automatic way of updating.

Why I Founded Techrich Corporation of Hong Kong, China

This is a question that I’ve been asked a lot considering that people ask if there is any duplication of overlap with compevo.  Techrich is an extension and complement to compevo and allows possibilities for our clients.

Being incorporated and based in Hong Kong allows us to provide more leverage and advantages and fills any gap that compevo may not have been able to fulfill.  In terms of data storage, security and connectivity Hong Kong cannot be beat.  It has the best of nearly all worlds.

Why Hong Kong?

This is the next, natural question that follows.  Hong Kong is economically, politically, and technically stable in terms of both IT infrastructure, ecosystem and most importantly its link to the outside world is fast and neutral.  Hong Kong itself is still the internet gateway to China, being directly connected to Mainland China.

Hong Kong is also has a large Big Data industry and demand due its reputation as a financial hub of the world which is a perfect ecosystem and fit for Techrich’s goals.

Contrary to some common belief, Hong Kong is not in the Asian Ring of Fire and is relatively free of any natural disasters, making it not only an ideal location on a world scale but perfect within Asia too.  Hong Kong does experience typhoons but they are rarely devastating and have little to no impact on IT or datacenter operations in Hong Kong. In fact Hong Kong’s power grid is known to be one of the most reliable and stable in the world.

In terms of internet routing Hong Kong is quite neutral with excellent connectivity to all of Asia, North America and Europe, but of particular importance is the capability of very low ping times into Mainland China that only Hong Kong can provide.

Slow Internet in China especially Shanghai and Beijing!

A colleague sent me this article asking my thoughts: http://travel.cnn.com/shanghai/life/your-internet-connection-feel-slow-its-probably-not-your-router-684008/

A lot of people automatically assume the best internet experience will be found in Beijing or Shanghai but we’ve never known that to be the case.  A lot of people will automatically assume it is the GFW/Great Firewall but in fact from my experience it simply seems to be packetloss due to congestion.  China has an enormous amount of demand for bandwidth since it has the highest amount of users (736million as of 2017!) in the world. This is why for compevo and Techrich we’ve always avoided the major centers due to the congestion.  However, bandwidth and good connectivity inside and outside of China is not always dictated by the location and even the backbone in the same cities of China are not created equally.  It takes a lot of research to get access to reliable and fast bandwidth in China but this is a completely different book to write.

With that in mind there are many ways around the slow internet in China, if your local connection is fast and connection inside China is fast you could simply use an internet acceleration service that runs through a less congested part of China and even Hong Kong.  It’s a great way to optimize your internet.

My recommendation even in 2017 is that there are many places in China with fast and reliable internet but major hubs will not likely one of them anytime soon due to the user bases in major areas being so large.

An Opportunity To Share Values

With so many customers asking us if someone from “ABC country” and “ABC religion” is welcome as a customer I thought it was finally time for us to go public on this very important topic.  There is nothing worse than someone being treated differently or in a negative way, whether verbally, physically or some other discriminatory policies simply because of their race, religion, creed, culture etc..  We have clients and staff of all different backgrounds from around the world, and I am pleased that we have such diversity and accordingly, this filters down into company policy as well.

The above is something I have always strongly believed in and was ingrained in me from childhood, that people of all ethnic and religious groups must live respectfully and peacefully.  Aside from a desire to see peace in the world, anything other than mutual respect and treatment of all people creates divisions that harm society and the economy.  There are no winners in such a situation.

With that said I am still shocked that apparently many CEO’s and management of various companies are reluctant to clarify their views and policies on these issues.  Under normal circumstances it may be considered taboo or uncomfortable to talk about, but with so much intolerance spreading throughout the world, there is definitely understandable cause for concern.

I do not take offense to customers asking us about our views and policies and I also think all businesses need to take a proactive approach with internal and public policies guiding staff and patrons.  People of all ethnic and religious groups can and should feel safe anywhere, but in these times it is understandable if some don’t, especially if a business refuses to clarify their beliefs and policies surrounding these issues.

As uncomfortable as some may find this talk to have with their employees or to be asked by customers, I think we should all be understanding and if we are truly inclusive an tolerant it shouldn’t be hard to reassure our customers and state that these are our views and our company policy.  To act with compassion and tolerance is the right thing to do and as both a person and business owner I will strive to make sure this is more clear and in the public and hope other company’s will do the same.

 

 

Teaching Code To Kids

I believe teaching coding to kids in any form is a benefit for them regardless of their career path.  It really exercises the brain and mind into solving problems in your mind and requires a lot of creativity.  If they can learn coding at a young age it is likely they will continue to learn well in other areas for the rest of their life.

I don’t know if there is a magic number of when to start but if a child is able to use a computer to play games, they are probably capable of being introduced.  I think it’s important to make it as a fun as possible and without too much pressure, which is obviously difficult at a younger age but part of getting them there is not just the coding, but if they start more advanced academics at a young age they are more likely to have the discipline to think things through.

A quick Google search makes it look like there is growing interest for kids and there are now platforms and services intended to help.

Another great thing about kids learning to code is that for children in impoverished areas of the world, who may have access to a computer can be on a level playing field.  In IT you work from almost anywhere in the world and your talent can be recognized.

Being an Expert on China’s Internet And Getting Ahead

After years of offering internet services in China a lot of our customers consider us experts on Chinese internet.  I’ve observed that on top of our typical IT consulting, that we’ve frequently been called on by firms as their consultant for all things internet in China.

In China the first thing people think of is regulations and rules and it often sounds more scary than it is or has to be.  Some firms have needlessly neglected the Chinese market over rules that may not even apply to their usage or simply over the concern of seeking an ICP license for hosting purposes (which is not all that hard if you have a presence in China).  Don’t get me wrong, like any country there are regulations to be followed and understood, but most of these are well-documented knowns.

The biggest challenge aside from the known regulatory issues in China is finding quality and reliable bandwidth, both locally (for within China) and overseas (outside China).  We get a lot of clients switching to us in China because they say our bandwidth is the fastest and most reliable.

A lot of people believe that a certain city or location guarantees their service will be slower or faster, but this is certainly not true whether you are on China Unicom or China Telecom.  There are some things I will agree with, however, which is that Telecom seems to have overall lower ping to most parts of the world.  But ping in China still does not guarantee better speeds for many different reasons and some circuits with higher ping have constantly outperformed to certain parts of the world.

There are many reasons why people have problems with the internet in China and I strongly believe one of the biggest factors is simply congestion.  Even in many parts of North America users can attest that the internet has been slow for them at some point over the years and it becomes obvious during summer months there is more usage.

Now imagine the same thing in China only with a population of nearly 1.4 billion people (2015) compared to all of North America’s 533million.  There is a high population density on most major centers of China and this is why home and office speeds have not been as fast as some other countries in Asia.

This is where I would always advise anyone who says “we want our servers only in Beijing and Shanghai” to think again.  You are going to be dealing with a lot of internet congestion at the local level, more firewall issues and the chance of disaster impacting major metropolitan areas is much higher.

Our course has been different from the start, we avoid congested city centers and find less used fiber and provision our circuits on that basis.  But it’s still not enough to rely on a certain area or even city.  We’ve tested dozens of circuits in some areas and found that at best a handful will have good speeds and many will still only be good within China.

To find the right mix it takes a lot of time, testing and travel in China and is why we’ve been so successful in helping our clients get ahead in China.

Internet in China is a constantly evolving and complex subject to say the least, and what may have been true days, or weeks ago may not be true anymore.  I always advise people that unless you have contacts in all parts of China and are willing to travel you absolutely must find a provider in China that is not restricted to a single area and is familiar with the networks strengths and weaknesses in all areas.

And last but not least, it’s about having the contacts to try and improve routing issues as 9/10 providers in China are not able or willing to respond to network issues on the backbone.

For those who are very serious about China they will often obtain servers with us on multiple Telecom and Unicom circuits in various areas of China.  This has consistently been a winning tactic for our clients for a long time.  The reason most clients want servers on China Telecom and China Unicom is simply not only for redundancy but for better connectivity as often the two providers have issues communicating.

I still feel it is not as entirely bad as some have expressed but it depends on the area.  As far as the backbone goes, usually as long as the Telecom and Unicom server are separated by a great distance things are less problematic.  But of course the issue is that consumer grade Telecom and Unicom is far different and is where the issues really come into play and necessitate the need for both providers, at least if serving the local market in China.

The above is really just some things we have seen in a nutshell but there is enough that goes on that would be enough to write a whole book on.

Dedicated Server Uptime Samples

I just logged into two random dedicated servers and I am always happy about the time uptimes we have:

13:05:37 up 960 days, 21 min,  1 user,  load average: 0.00, 0.01, 0.05

14:11:14 up 835 days, 18:01,  6 users,  load average: 0.09, 0.02, 0.01

In the case of both servers they have never been down, they were literally installed on a rack from the time shown above.

The reason our uptime is always fantastic is not only because our facilities being out of the core disaster areas.  We never overload or oversell our servers.  We are not a budget provider, but still offer excellent value in my opinion.  We’ve had a lot of clients switch to us from other hosts primarily based on the reasoning “no amount of features or gimmicks in the world matter if you have an unreliable service”.

 

My first 8TB Seagate Drive

Just for memories I kept the dmesg output of when I hotplugged this drive into my LSI Logic SAS2008 based card.  For those who are wondering any chipset that is 3TB capable will work fine for 8+ TB drives.  In fact I found if I plugged this drive into a chipset that didn’t support 3TB, not only did it not work but even most Linux machines just literally froze (I expected that the drive would just show itself as being 2TB but this was not the case).

Updated Performance Observations

These are said to be for backup only and are called slow but at 5980RPM what do you expect?  However the raw transfer rates are outstanding some of the best you will see for a mechanical drive.  So far they are working fine in an mdadm RAID 10 with fantastic performance as demonstrated below:

Is the Seagate ST8000AS0002-1NA17Z 8TB 5980RPM really such a slow drive?
These are some of the best sync speeds I’ve seen in general about 150-160MB/s.
Time will tell but so far it looks like putting these drives into a RAID array looks like a good and fine idea.

Device Model:     ST8000AS0002-1NA17Z

md99 : active raid10 sdd1[2] sde1[0]
7813894144 blocks super 1.2 512K chunks 2 far-copies [2/1] [U_]
[>………………..]  recovery =  0.9% (75532224/7813894144) finish=785.2min speed=164232K/sec

md99 : active raid10 sdd1[2] sde1[0]
7813894144 blocks super 1.2 512K chunks 2 far-copies [2/1] [U_]
[>………………..]  recovery =  1.3% (109094080/7813894144) finish=823.5min speed=155914K/sec

md99 : active raid10 sdd1[2] sde1[0]
7813894144 blocks super 1.2 512K chunks 2 far-copies [2/1] [U_]
[>………………..]  recovery =  3.1% (245071808/7813894144) finish=793.3min speed=159008K/sec

md99 : active raid10 sdd1[2] sde1[0]
7813894144 blocks super 1.2 512K chunks 2 far-copies [2/1] [U_]
[>………………..]  recovery =  4.9% (386057920/7813894144) finish=787.3min speed=157228K/sec

And who says these drives are bad and slow in a 2 drive mdadm RAID 10 (far layout)?
338-381MB/s is very respectable!

sudo dd if=/dev/md99 of=/dev/null bs=1M count=5000 skip=50000
5000+0 records in
5000+0 records out
5242880000 bytes (5.2 GB) copied, 15.5328 s, 338 MB/s

sudo dd if=/dev/md99 of=/dev/null bs=1M count=5000 skip=500000
[sudo] password for user:
5000+0 records in
5000+0 records out
5242880000 bytes (5.2 GB) copied, 13.7528 s, 381 MB/s

5242880000 bytes (5.2 GB) copied, 15.5328 s, 338 MB/s
user@box:~$ sudo dd if=/dev/md99 of=/dev/null bs=1M count=5000 skip=500000
[sudo] password for user:
5000+0 records in
5000+0 records out
5242880000 bytes (5.2 GB) copied, 13.7528 s, 381 MB/s
user@box:~$ sudo dd if=/dev/md99 of=/dev/null bs=1M count=5000 skip=500000
[sudo] password for user:
5000+0 records in
5000+0 records out
5242880000 bytes (5.2 GB) copied, 2.7552 s, 1.9 GB/s
user@box:~$ sudo dd if=/dev/md99 of=/dev/null bs=1M count=5000 skip=5000000
5000+0 records in
5000+0 records out
5242880000 bytes (5.2 GB) copied, 15.1745 s, 346 MB/s

sudo dd if=/dev/md99 of=/dev/null bs=1M count=5000 skip=600000
[sudo] password for user:
5000+0 records in
5000+0 records out
5242880000 bytes (5.2 GB) copied, 14.8966 s, 352 MB/s

 

The magic in Linux that happens when you hotplug this 8TB Seagate drive:

Jun  3 22:46:24 mybox kernel: [5110723.801829] scsi 6:0:0:0: Direct-Access     ATA      ST8000AS0002-1NA AR17 PQ: 0 ANSI: 6
Jun  3 22:46:24 mybox kernel: [5110723.801857] scsi 6:0:0:0: SATA: handle(0x0009), sas_addr(0x4433221107000000), phy(7), device_name(0x0000000000000000)
Jun  3 22:46:24 mybox kernel: [5110723.801866] scsi 6:0:0:0: SATA: enclosure_logical_id(0x500605b0079b94b0), slot(7)
Jun  3 22:46:24 mybox kernel: [5110723.801979] scsi 6:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Jun  3 22:46:24 mybox kernel: [5110723.801988] scsi 6:0:0:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
Jun  3 22:46:24 mybox kernel: [5110723.805791] sd 6:0:0:0: Attached scsi generic sg7 type 0
Jun  3 22:46:24 mybox kernel: [5110723.806349] sd 6:0:0:0: [sdg] 15628053168 512-byte logical blocks: (8.00 TB/7.27 TiB)
Jun  3 22:46:24 mybox kernel: [5110723.806362] sd 6:0:0:0: [sdg] 4096-byte physical blocks
Jun  3 22:46:24 mybox kernel: [5110723.950714] sd 6:0:0:0: [sdg] Write Protect is off
Jun  3 22:46:24 mybox kernel: [5110723.950730] sd 6:0:0:0: [sdg] Mode Sense: 7f 00 10 08
Jun  3 22:46:24 mybox kernel: [5110723.952804] sd 6:0:0:0: [sdg] Write cache: enabled, read cache: enabled, supports DPO and FUA
Jun  3 22:46:24 mybox kernel: [5110724.116071]  sdg: unknown partition table
Jun  3 22:46:24 mybox kernel: [5110724.281364] sd 6:0:0:0: [sdg] Attached SCSI disk

Oops I used fdisk, but it can’t support an 8TB disk so instead use gdisk
sudo fdisk /dev/sdg

Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0x9f3ad455.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won’t be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

WARNING: The size of this disk is 8.0 TB (8001563222016 bytes).
DOS partition table format can not be used on drives for volumes
larger than (2199023255040 bytes) for 512-byte sectors. Use parted(1) and GUID
partition table format (GPT).

The device presents a logical sector size that is smaller than
the physical sector size. Aligning to a physical sector (or optimal
I/O) size boundary is recommended, or performance may be impacted.

Command (m for help): p

Disk /dev/sdg: 8001.6 GB, 8001563222016 bytes
255 heads, 63 sectors/track, 972801 cylinders, total 15628053168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x9f3ad455

Device Boot      Start         End      Blocks   Id  System

Command (m for help): q

gdisk supports 3TB+ drives no problem and the interface is the same as fdisk:
sudo fdisk /dev/sdg

Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0x9f3ad455.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won’t be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

WARNING: The size of this disk is 8.0 TB (8001563222016 bytes).
DOS partition table format can not be used on drives for volumes
larger than (2199023255040 bytes) for 512-byte sectors. Use parted(1) and GUID
partition table format (GPT).

The device presents a logical sector size that is smaller than
the physical sector size. Aligning to a physical sector (or optimal
I/O) size boundary is recommended, or performance may be impacted.

Command (m for help): p

Disk /dev/sdg: 8001.6 GB, 8001563222016 bytes
255 heads, 63 sectors/track, 972801 cylinders, total 15628053168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x9f3ad455

Device Boot      Start         End      Blocks   Id  System

Command (m for help): q
GPT fdisk (gdisk) version 0.8.8

Partition table scan:
MBR: not present
BSD: not present
APM: not present
GPT: not present

Creating new GPT entries.

Command (? for help): p
Disk /dev/sdg: 15628053168 sectors, 7.3 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): 99C74283-3F66-4AEB-B427-210253274010
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 15628053134
Partitions will be aligned on 2048-sector boundaries
Total free space is 15628053101 sectors (7.3 TiB)

Number  Start (sector)    End (sector)  Size       Code  Name

Command (? for help): n
Partition number (1-128, default 1): p
First sector (34-15628053134, default = 2048) or {+-}size{KMGTP}: 1
First sector (34-15628053134, default = 2048) or {+-}size{KMGTP}:
Last sector (2048-15628053134, default = 15628053134) or {+-}size{KMGTP}:
Current type is ‘Linux filesystem’
Hex code or GUID (L to show codes, Enter = 8300): L
0700 Microsoft basic data  0c01 Microsoft reserved    2700 Windows RE
4100 PowerPC PReP boot     4200 Windows LDM data      4201 Windows LDM metadata
7501 IBM GPFS              7f00 ChromeOS kernel       7f01 ChromeOS root
7f02 ChromeOS reserved     8200 Linux swap            8300 Linux filesystem
8301 Linux reserved        8302 Linux /home           8400 Intel Rapid Start
8e00 Linux LVM             a500 FreeBSD disklabel     a501 FreeBSD boot
a502 FreeBSD swap          a503 FreeBSD UFS           a504 FreeBSD ZFS
a505 FreeBSD Vinum/RAID    a580 Midnight BSD data     a581 Midnight BSD boot
a582 Midnight BSD swap     a583 Midnight BSD UFS      a584 Midnight BSD ZFS
a585 Midnight BSD Vinum    a800 Apple UFS             a901 NetBSD swap
a902 NetBSD FFS            a903 NetBSD LFS            a904 NetBSD concatenated
a905 NetBSD encrypted      a906 NetBSD RAID           ab00 Apple boot
af00 Apple HFS/HFS+        af01 Apple RAID            af02 Apple RAID offline
af03 Apple label           af04 AppleTV recovery      af05 Apple Core Storage
be00 Solaris boot          bf00 Solaris root          bf01 Solaris /usr & Mac Z
bf02 Solaris swap          bf03 Solaris backup        bf04 Solaris /var
bf05 Solaris /home         bf06 Solaris alternate se  bf07 Solaris Reserved 1
bf08 Solaris Reserved 2    bf09 Solaris Reserved 3    bf0a Solaris Reserved 4
bf0b Solaris Reserved 5    c001 HP-UX data            c002 HP-UX service
ea00 Freedesktop $BOOT     eb00 Haiku BFS             ed00 Sony system partitio
ef00 EFI System            ef01 MBR partition scheme  ef02 BIOS boot partition
Press the <Enter> key to see more codes: fd
fb00 VMWare VMFS           fb01 VMWare reserved       fc00 VMWare kcore crash p
fd00 Linux RAID
Hex code or GUID (L to show codes, Enter = 8300): p
Hex code or GUID (L to show codes, Enter = 8300): w
Hex code or GUID (L to show codes, Enter = 8300): q
Hex code or GUID (L to show codes, Enter = 8300):
Changed type of partition to ‘Linux filesystem’

Command (? for help): p
Disk /dev/sdg: 15628053168 sectors, 7.3 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): 99C74283-3F66-4AEB-B427-210253274010
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 15628053134
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number  Start (sector)    End (sector)  Size       Code  Name
1            2048     15628053134   7.3 TiB     8300  Linux filesystem

Command (? for help): wq

Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!

Do you want to proceed? (Y/N): y
OK; writing new GUID partition table (GPT) to /dev/sdg.
The operation has completed successfully.

After exiting gdisk I do a quick and basic read performance test but find it strikingly accurate of all the disks I’ve tested:

Read performance of 196MB/s was far greater than

userlogin@mybox:~$ sudo dd if=/dev/sdg of=/dev/null bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 53.3865 s, 196 MB/s

sudo mdadm –create /dev/md8 –level 10 –raid-devices=2 missing /dev/sdg1
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md8 started.
one@Opteron2419:~/Downloads$ cat /proc/mdstat
Personalities : [raid1] [raid10] [linear] [multipath] [raid0] [raid6] [raid5] [raid4]
md8 : active raid10 sdg1[1]
7813894144 blocks super 1.2 2 near-copies [2/1] [_U]

md125 : active (auto-read-only) raid1 sdd2[1] sdc2[0]
20490816 blocks [2/2] [UU]

md126 : active (auto-read-only) raid1 sdd1[0] sdc1[1]
943730240 blocks [2/2] [UU]

md127 : active raid10 sdf1[2] sde1[0]
732442112 blocks super 1.2 2 near-copies [2/2] [UU]

md2 : active raid10 sda3[2] sdb3[0]
709372928 blocks super 1.2 512K chunks 2 far-copies [2/2] [UU]
bitmap: 5/6 pages [20KB], 65536KB chunk

md0 : active raid1 sda1[2] sdb1[0]
20955008 blocks super 1.2 [2/2] [UU]

md1 : active raid1 sda2[1] sdb2[0]
2097088 blocks [2/2] [UU]

unused devices: <none>
one@Opteron2419:~/Downloads$ pvcreate /dev/md8
/dev/mapper/control: open failed: Permission denied
Failure to communicate with kernel device-mapper driver.
WARNING: Running as a non-root user. Functionality may be unavailable.
Device /dev/md8 not found (or ignored by filtering).
one@Opteron2419:~/Downloads$ sudo pvcreate /dev/md8
Physical volume “/dev/md8” successfully created
one@Opteron2419:~/Downloads$ sudo vgcreate backups /dev/md8
Volume group “backups” successfully created
one@Opteron2419:~/Downloads$ vgdisplay
/dev/mapper/control: open failed: Permission denied
Failure to communicate with kernel device-mapper driver.
WARNING: Running as a non-root user. Functionality may be unavailable.
No volume groups found
one@Opteron2419:~/Downloads$ sudo vgdisplay
— Volume group —
VG Name               backups
System ID
Format                lvm2
Metadata Areas        1
Metadata Sequence No  1
VG Access             read/write
VG Status             resizable
MAX LV                0
Cur LV                0
Open LV               0
Max PV                0
Cur PV                1
Act PV                1
VG Size               7.28 TiB
PE Size               4.00 MiB
Total PE              1907688
Alloc PE / Size       0 / 0
Free  PE / Size       1907688 / 7.28 TiB
VG UUID               DwxSxL-UmcV-TyjM-TUa1-ef4w-lrK5-cipxiI

one@Opteron2419:~/Downloads$ sudo mkdir /mnt/md8
one@Opteron2419:~/Downloads$ sudo vi /etc/fstab
one@Opteron2419:~/Downloads$ sudo mkfs.ext
mkfs.ext2     mkfs.ext3     mkfs.ext4     mkfs.ext4dev
one@Opteron2419:~/Downloads$ sudo mkfs.ext4 /dev/b
block/         bsg/           btrfs-control  bus/
one@Opteron2419:~/Downloads$ sudo mkfs.ext4 /dev/b
block/         bsg/           btrfs-control  bus/
one@Opteron2419:~/Downloads$ sudo mkfs.ext4 /dev/block/
1:0    1:11   1:15   1:5    1:9    7:0    7:4    8:0    8:18   8:32   8:48   8:64   8:96   9:125  9:8
1:1    1:12   1:2    1:6    259:0  7:1    7:5    8:1    8:19   8:33   8:49   8:65   8:97   9:126
1:10   1:13   1:3    1:7    259:1  7:2    7:6    8:16   8:2    8:34   8:50   8:80   9:0    9:127
11:0   1:14   1:4    1:8    259:2  7:3    7:7    8:17   8:3    8:35   8:51   8:81   9:1    9:2
one@Opteron2419:~/Downloads$ sudo mkfs.ext4 /dev/
Display all 249 possibilities? (y or n)
one@Opteron2419:~/Downloads$ sudo mkfs.ext4 /dev/
Display all 249 possibilities? (y or n)
one@Opteron2419:~/Downloads$ sudo mkfs.ext4 /dev/b
block/         bsg/           btrfs-control  bus/
one@Opteron2419:~/Downloads$ sudo mkfs.ext4 /dev/lv^C
one@Opteron2419:~/Downloads$ sudo lvcreate -L 7.28TB backups -n backuplv
Rounding up size to full physical extent 7.28 TiB
Volume group “backups” has insufficient free space (1907688 extents): 1908409 required.
one@Opteron2419:~/Downloads$ sudo lvcreate -L 7.2TB backups -n backuplv
Rounding up size to full physical extent 7.20 TiB
Logical volume “backuplv” created
one@Opteron2419:~/Downloads$ sudo lvcreate -L 7.26TB backups -n backuplv
Rounding up size to full physical extent 7.26 TiB
Logical volume “backuplv” already exists in volume group “backups”
one@Opteron2419:~/Downloads$ sudo mkfs.ext4 /dev/b
backups/       block/         bsg/           btrfs-control  bus/
one@Opteron2419:~/Downloads$ sudo mkfs.ext4 /dev/b
backups/       block/         bsg/           btrfs-control  bus/
one@Opteron2419:~/Downloads$ sudo mkfs.ext4 /dev/backups/backuplv
mke2fs 1.42.9 (4-Feb-2014)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=128 blocks, Stripe width=128 blocks
241594368 inodes, 1932735488 blocks
96636774 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
58983 block groups
32768 blocks per group, 32768 fragments per group
4096 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848, 512000000, 550731776, 644972544

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

sudo smartctl -a /dev/sdg
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.16.0-38-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST8000AS0002-1NA17Z
Serial Number:    Z840F70X
LU WWN Device Id: 5 000c50 090ae82a1
Firmware Version: AR17
User Capacity:    8,001,563,222,016 bytes [8.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5980 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Jun  3 23:08:40 2016 PDT
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection:         (    0) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      ( 953) minutes.
Conveyance self-test routine
recommended polling time:      (   2) minutes.
SCT capabilities:            (0x30b5)    SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate     0x000f   112   100   006    Pre-fail  Always       –       48250328
3 Spin_Up_Time            0x0003   098   098   000    Pre-fail  Always       –       0
4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       –       2
5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       –       0
7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always       –       29055
9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       –       0
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       –       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       –       2
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       –       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       –       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       –       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       –       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       –       0
190 Airflow_Temperature_Cel 0x0022   063   063   045    Old_age   Always       –       37 (Min/Max 29/37)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       –       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       –       1
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       –       2
194 Temperature_Celsius     0x0022   037   040   000    Old_age   Always       –       37 (0 21 0 0 0)
195 Hardware_ECC_Recovered  0x001a   112   100   000    Old_age   Always       –       48250328
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      –       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       –       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      –       170162309300224
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      –       406660
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      –       20496814

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
1        0        0  Not_testing
2        0        0  Not_testing
3        0        0  Not_testing
4        0        0  Not_testing
5        0        0  Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

one@Opteron2419:~/Downloads$ sudo smartctl -a /dev/sdg
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.16.0-38-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST8000AS0002-1NA17Z
Serial Number:    Z840F70X
LU WWN Device Id: 5 000c50 090ae82a1
Firmware Version: AR17
User Capacity:    8,001,563,222,016 bytes [8.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5980 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Jun  3 23:08:43 2016 PDT
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection:         (    0) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      ( 953) minutes.
Conveyance self-test routine
recommended polling time:      (   2) minutes.
SCT capabilities:            (0x30b5)    SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate     0x000f   112   100   006    Pre-fail  Always       –       48382752
3 Spin_Up_Time            0x0003   098   098   000    Pre-fail  Always       –       0
4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       –       2
5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       –       0
7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always       –       29186
9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       –       0
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       –       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       –       2
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       –       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       –       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       –       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       –       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       –       0
190 Airflow_Temperature_Cel 0x0022   063   063   045    Old_age   Always       –       37 (Min/Max 29/37)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       –       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       –       1
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       –       2
194 Temperature_Celsius     0x0022   037   040   000    Old_age   Always       –       37 (0 21 0 0 0)
195 Hardware_ECC_Recovered  0x001a   112   100   000    Old_age   Always       –       48382752
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      –       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       –       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      –       182961311842304
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      –       410761
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      –       20496814

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
1        0        0  Not_testing
2        0        0  Not_testing
3        0        0  Not_testing
4        0        0  Not_testing
5        0        0  Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

sudo badblocks /dev/backups/backuplv
badblocks: Value too large for defined data type invalid end block (7730941952): must be 32-bit value

Updated Reliability Issues

I bought another 2 of these drives.  While the original still works fine, the newer one which is the identical model but pulled from an external Seagate enclosure has bad sectors already (within a few months of real usage).  It cannot be kept in the RAID array due to bad sectors (I have tried readding but some part of the disc near the end appears bad).

It looks like with so many dense platters the probability of issues is much higher than with any other drives in the 2-3TB size I’ve seen.

Also note even while initially copying data to it when it was not in a RAID array it would freeze and become unresponsive.  It appears to be a factory defect.  I am not sure if Seagate will warranty it now that this drive was removed from its original enclosure (there is no physical damage but I am not sure of their policies).

I have a third one of these drives that I haven’t tested yet and hope it won’t suffer the same fate.

 

[3473975.270154] sd 8:0:1:0: attempting task abort! scmd(ffff880069352880)
[3473975.270168] sd 8:0:1:0: [sde] CDB:
[3473975.270173] Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00
[3473975.270197] scsi target8:0:1: handle(0x000a), sas_address(0x4433221106000000), phy(6)
[3473975.270204] scsi target8:0:1: enclosure_logical_id(0x500605b0079b94b0), slot(6)
[3473979.121100] sd 8:0:1:0: task abort: SUCCESS scmd(ffff880069352880)
[3473979.126991] sd 8:0:1:0: [sde] Device not ready
[3473979.127005] end_request: I/O error, dev sde, sector 2056
[3473979.127013] md: super_written gets error=-5, uptodate=0
[3473979.127024] md/raid10:md99: Disk failure on sde1, disabling device.
[3473979.127024] md/raid10:md99: Operation continuing on 1 devices.
[3473979.127038] sd 8:0:1:0: [sde]
[3473979.127044] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[3473979.127049] sd 8:0:1:0: [sde]
[3473979.127053] Sense Key : Not Ready [current]
[3473979.127064] sd 8:0:1:0: [sde]
[3473979.127070] Add. Sense: Logical unit not ready, cause not reportable
[3473979.127080] sd 8:0:1:0: [sde] CDB:
[3473979.127087] Read(16): 88 00 00 00 00 00 ea 84 08 20 00 00 00 08 00 00
[3473979.127115] end_request: I/O error, dev sde, sector 3934521376
[3473979.127128] md/raid10:md99: sde1: rescheduling sector 7868514336
[3473979.165594] md/raid10:md99: sdd1: redirecting sector 7868514336 to another mirror

How To Make Your Office IT/Computer Hardware More Green

As part of doing my part for the environment I consolidated a lot of other server/computer hardware in my office into a low-power, quiet and cool running Dedicated AMD Opteron Desktop Workstation Server from scratch.  It has kept my office cooler and quieter, all while saving on power and more importantly the environment.  I should have added that I also reduced the number of hard disks from several if not dozens in my office, to just 6 disks (all of them larger I believe 4-8TB each).

But I wanted to take it a step further and I admit I was also motivated each morning by the unfriendly smells of burning PCB no matter what I did.  It’s an exercise in efficiency, savings, environment and your own health and sanity.

In my office I have a gigabit 24-port rack mount switch and 42U server rack where I store parts and other items for testing and development.  Believe it or not but this switch seems to have made an incredible amount of heat and even worse, the burning PCB smell which can’t be healthy and it still baffles me because its fan is working just fine and the unit doesn’t get that hot.

This is where the waste part came in, the thing is that under my Desk I have a small gigabit switch for all of my other devices such as VOIP, phone, printer, laptop, etc… and the 24-port switch only has 4 or 5 ports active.  I’ve kept the 24-port on the rack and ready to plug in and I just switched in a humble but efficient 5-port gigabit switch which has reduced heat and the bad PCB smell in the office.

One thing I admit that I have done (or rather haven’t) is enabled any kind of sleep mode for my Desktop workstation and this is because it is nearly always active and I like to remotely connect to it at odd times of the day.  But still this current Opteron workstation runs cooler than my previous labyrinth of servers and workstations that were active so I can actually hear again.  By consolidating most services into a single unit with virtual servers you can often eliminate the majority of power usage which primarily comes from hot and power hungry CPUs.  This is one reason why I haven’t upgraded to newer Opteron architecture, yes you get more cores per CPU but the power usage ends up being more than what I am into now and is no more efficient and far exceeds my current needs.