Fall 2018 Release – Infrascale DR v6.16

This Fall, we have releases for 2 of our services, Infrascale Disaster Recovery (IDR) and Infrascale Cloud Backup (ICB).

IDR v6.16 boasts quality of life improvements, key fixes and new support for VMware v6.7!

ICB v7.3 includes many fixes, performance improvements and remote management improvements for distributed networks, plus new support for Windows Server 2019 and MS SQL 2017.

Again, big thanks to our Partners, who continue to be a cornerstone of our mission to eradicate downtime and data loss for businesses of all sizes.

And of course, hats off to our Product and Dev teams for their agile performance and turn-around time!

THE HIGHLIGHTS

VMware 6.7 Support (IDR 6.16)

This is a big but easy one to describe. While most functions remained stable with the update to VMware v6.7, there were a handful of workarounds we employed for our administrators to regain full and consistent functionality. With IDR v6.16, all functionality and performance experienced with previous versions of VMware have been returned, with the added benefits of the items listed below.

Custom QEMU Commands during LocalBoot (IDR 6.16)

Like any high performance machine, it’s sometimes handy to get in and make some tweaks before hitting the road. Administrators are now able to do this with editing a config file each time by configuring custom QEMU commands to run with boot operations.

Define Client Schedules During Client Creation (IDR 6.16)

This is one of those simple but important time-savers. Rather than configuring a client, THEN going back to edit the schedule, we’ve combined these steps into a single, fluid motion.

“Legal Hold” backup/DR jobs (IDR 6.16)

Whether for a legal hold or any other reason, sometimes you just need to stop a backup from getting purged by the retention policies you have set. Administrators can now “pin” backup jobs to do exactly this. When an administrator “pins” a job, they are able to set the time-frame before the job would cycle back into the regular retention policy for that particular location (primary or secondary location).

*Note, this feature, like retention policies, are set per backup location; either primary or secondary. We note this because most services in the market require identical retention periods for replication services, but we figured we’d let you decide.

Improved Mass Deployment/Management for Cloud Backup (ICB v7.3)

We continue to find great success with our administrators using ICB to protect data at the point of data creation, rather than limiting BCDR plans to central servers. With this growing trend and major differentiation when comparing against traditional DR vendors, we’ve received many improvement requests to keep this growth going strong.

Details can be found in the table below, but there were a number of improvements to help administrators protect data across all devices for their respective businesses. This includes changes to workflows for deployment, changes to required permissions helpful when password resets are in order and difficulty in protecting personal folders when deployed with administrative credentials.

 

Full List – 

The table below contains the full release notes, and while there are too many to highlight, they add up to yet another major quality of life improvement for our techs and their customers.

Short DescriptionExplanation
FIX: Weekly appliance report could not be generatedSome customers were unable to get helpful weekly reports and would need to manually gather the information. This is no longer an issue. This bug fix saves 4-6 steps per report.
FIX: Update of Agent installed on client could no be completed due to service not stopping properlyIn this case, it would require additional steps and time for the administrator to manually stop the service before restarting the update. This bug fix saves roughly 6 steps per agent.
FIX: DR backup could fail sometimesIn some cases, a DR backup would fail, which triggers alerts and additional follow-up. The causes are varied, and most fixes were simply to re-run a backup. We’ve fixed what seems to be central to the issue, saving roughly 4-6 steps per event.
UPDATE: New technology for Browse&restore function – increased list of supported file systems, stability, multiple bug fixesSome new types of file systems could not be browsed during restore before introduction of this technology, thus we are decreasing time spent to restore one or a few files drastically – customer will not need perform restore of the whole system or mount notbrowsable disk.
UPDATE: Full support for VMware 6.7VMware 6.7 is now fully supported, whereas before, users could run into some issues here and there.
NEW: Ability to add DR schedule during creation of Client, column shown by default now in the list of ClientsBefore, these were separate steps. Setting up new clients is now 4-6 clicks shorter.
FIX: Some jobs could not be deleted by retentionRetention periods are important to reduce unnecessary usage on both the primary and secondary appliances. Now, all jobs can be removed by policies unless exempted by the “pinning” feature.
FIX: Hardware Monitoring section did not show CPU and Hard Disk information on some modelsFixed so all data shows for all models.
UPDATE: Support for local boot of DR images /VM backups of clients with disks with 32 sectors per trackWe have customers with specific models of hardware and disk geometry now they have them fully protected – we will be abel to boot them in case of DR event.
NEW: Ability to specify custom QEMU commands during LocalBootIn very specific cases local/cloud boot settings need some tweeks by support inside of the appliance, which were not persistent, with this update this specific settings can be stored without need to edit configuration files.
FIX: Automatically set correct CPU model during LocalBoot
NEW: Ability to change CPU model during LocalBoot
During a DR restore, we automatically recommend/use settings based on the machine being booted to reduce the chance of booting an a machine into an environment that will underperform while simultaneously saving time in doing so. We’ve improved this automation and extended customization of boot settings from being only available for cloud boots, to restores on the local appliance as well.
REMOVED: Autoarchive of differential and incremental levels of backups on Virtual MachineWe are treating all backups of VM as fulls, it was inconsistency and preparation step for Retention 2.0 policy (coming in 6.17)
FIX: Disabled ability to edit clients on secondary applianceRemoved the confusion with ability to edit clients on secondary – that was making no sense.
FIX: Disable visibility of Boot verification images in the list of clients in case Boot verification is turned offIf Boot verification was turned off after sometime of working it would show in sometime very old boot verification results and was disinformational to customers.
NEW: Ability to connect to appliances inside local network with https://devices.infrascale.comThis will remove steps to connect keyboard and mouse during initial setup in order to know IP adress of the Appliance – after turning on any appliance will become visible on devices.infrascale.com.
FIX: Ability to bond network interfaces on some models of appliancesNew 550 appliance had disabled some network settings as it does not have boot ability which were required to increase connection speed by bonding network interfaces.
UPD: Set “date until” to +10 years from current moment for previously pinned jobsMany compliance requirements are for the length of data retention, and the “pinned jobs” feature released earlier this year did not allow jobs to be excluded from retention policies long enough, requiring a reminder and manual adjustment of the “pin” after some amount of time.
FIX: Unable to boot VM with dynamic disks in certain circumstancesThis was a major fix for customers in this situation, as they would have to go through a much more traditional baremetal recovery process rather than a quick, easy DR recovery flow.
UPDATE: Reduce amount of storage used for performing DR backupsThis helps reduce the need to upgrade primary appliances due to the need for temporary storage of files during DR backup procedures. The temporary storage footprint is now significantly smaller. The result of not having enough space would be that backup jobs would fail, or run much more slowly and require a storage upgrade.
UPDATE: Preserve disks order during LocalBoot on Appliance on in the CloudSome VM have got a lot of number of disks and our boot process in some cases could not detect correct boot disk which lead to boot fail. We have implemented more sophisticated algorithm and number of boot failures will decrease now.
FIX: Ability to use UK keyboard layout
NEW: Ability to choose keyboard layout for locally booted machine
Keyboards aren’t all the same. The default option was the US ENGLISH QWERTY keyboard layout, and we now support the UK ENGLISH QWERTY keyboard.
UPDATE: More concise description of requirements for performing LocalBoot on Virtual ApplianceWe have removed pain, when customers were trying to boot image inside of the virtual appliance and were facing information that not enough resources are assigned in order to be able to boot – now it’s stated how much exactly memory and CPU are required.
UPDATE: Correct link to the Knowledge Base in Support tab of ApplianceInsideWe are routing customers now to maintained and updated version of Infrascale’s Knowledge Base.
FIX: Sometime VMware backup of turned off machine could failIf a VM within VMware was turned off, the backup would occassionally fail. This is common for VMs created as templates, but not regularly used.
UPD: Show user-friendly message on Browse and Restore if no data can be found.Our users where confused in some corner cases, when we could not browse the disk and show it content – we just showed empty screen, now we give friendly message about such situations.
FIX: Jobs re-imported from archive or pushed back from secondaries will not be deleted by retention at onceIn case of different retention settings on primary(short) and secondary ( longer) job restored from archive or pushed back to primary would have passed retention and deleted. Now it will be automatically pinned in order the user would be able to restore and manipulate the job.
NEW: Ability to pin particular job until some date in future“Pinning” a job excludes it from retention policies for as long as determined by the user when “pinning” a job. This is helpful in many cases: compliance, breach investigation, employee turn over, legal disputes.
FIX: Garbage Collection history now shows dataProcess of freeing space on Appliance is done during Garbage Collection – now it’s possible to see how it is working.
FIX: VMware client will show progress of backup nowReduced number of clicks in order to check how progress is working – now it is visible on all Clients.
FIX: Backup would failed of the Hyper-V guest migrated to another host
UPDATE: Stability of DR backups
UPDATE: Stability of Hyper-V backups
In general, this is a group of updates and fixes that improve the reliability and reduce the TCO when protecting Hyper-V environments as well as fixing an issue caused after Hyper-V VMs migrate to new hosts.

Previously, these issues would result in alerts/warnings that would take techs time to evaluate and resolve.

NEW: RAID status is shown on console screenMany of our administrators cited this as a big help rather than having to dig around for the info, deeper in the console. Here you go!
UPDATE: Ability to perform Linux DR backup with non-root user credentialsA lot of Linux distributive have root user deactivated now by default due to security measures, so our customers could not perform backups of such Linux machines – we opened ability to perform backup with any users credentials (after special settings are done)
UPDATE: Changed customization of replication bandwidth limitSome of our customers are not working Monday-Friday and would like to have more flexible way to customize replication bandwidth limit – now any day during week can be set to decrease speed.
FIX: Daily report will show correct restore point date in case of last backup has failedDaily report was showing not latest but first very old restore point and made confusion that no recent backups are available per machine. So now we show correct most recent restore point.
NEW: In case of appliance decomission you can securely shred all dataWe have added script in order customer would be able securely delete all data on appliance in case decommissioning.
FIX: DDFS in case of shutdown will be unmounted at once.We have changed logic of deleting files in DDFS in order to being able to unmount filesystem ot once.

 

What’s Next?

We’re on the cusp of some major changes under the hood that will result in simpler deployment steps and improved performance.

We’re also going to be announcing updated and new integrations with Connectwise Automate.

Stay tuned!

-The Infrascale Product Team

Summer 2018 Release Part 2 – Infrascale DR v6.15

Part 2 of our summer release schedule is days away! Meet IDR v6.15.

IDR v6.15 boasts quality of life improvements and key fixes. This release removes many common procedures by adding in some automation and/or shorter workflow options. We’ve estimated a reduction of roughly 20 steps across multiple, recurring scenarios, which could easily be 200-500 less tasks performed per year per IDR deployment–money in the bank.

Our ability to turn around such a quick and valuable release is due largely to our terrific community of Partners that continue to be a cornerstone of our mission to eradicate downtime and data loss for businesses of all sizes.

And of course, hats off to our Product and Dev teams for their agile performance and turn-around time!

Quality of Life Improvements

NEW: Automated, Online DDFS compact

Previously, admins would receive a warning/error from the monitoring system saying storage limits have been reached or were close to being reached.

Next, the admin would try to free up some space by deleting jobs and/or would contact support for assistance.

Support would then suggest running “compact” to free up the needed space. Doing so requires shutting down the entire appliance and could take days to complete; roughly 1 day per 1TB of freed space.

By automating the DDFS compact task to take place in the background, we’ve eliminated at least 4 steps (per occurrence) and eliminated downtime during such an event.

NEW: Ability to unlock VMware VM migration option from Appliance

Before a VMware VM is protected, we must disable the ability for a VM to be migrated to ensure the backup is successful. Sometimes, the VM doesn’t properly unlock afterwards, requiring the user to manually unlock the VM by either running another backup or going into vCenter.

While we’re still working on an automated resolution, adding an option to unlock a VM from within our system now allows an administrator to manually unlock the VM in 1 step instead of 2 or 3. We’re working to better avoid the issue altogether in later releases.

NEW: Archive Option – Ability to Pin/Unpin Jobs to be ignored by retention policies

Whether due to a hardware refresh, employee turn-over or some other event, it’s important to be able to retain specific backups despite the retention policy set for them by the client. This new “pinning” feature allows administrators to do just that.

First, a vocab review – within Infrascale IDR, a client is created by defining a retention policy and a schedule for a particular machine (virtual or physical). When a backup runs according to the configured client, that’s called a job. Jobs are stored on the primary and/or secondary for as long as the retention policy indicates.

Pinning a job will exclude it from the retention policies set for that client and will default to keeping that job indefinitely–essentially, an archival option.

For example, if you’re decommissioning a machine but you want, need, to keep a backup for it, you’d pin the jobs you want to keep before removing the client. Compliance, maintained.

This is a first step in many to add more flexibility and customization when needing exceptions at the job-level.

Important note* Like retention policies, pinning jobs is done individually for the primary and the secondary (or cloud) appliances; pinning a job on your primary will not auto-pin the same job on the secondary. Be sure to pin jobs on the appliance where you’ll want the archived job to be kept, or both.

Noteworthy

We’ve also added ‘deleting a client’ as a recorded event in audit logs and removed a few clicks from everyone’s life by adding the firmware version on the login screen in addition to the settings area. Strangely but understandably, we slowed down the initiation of a local boot with a 5 second delay so admins have time to smash into safe mode for some debug.

•    UPDATE: Audit logging – added “client deletion” as a new event

•    UPDATE: Firmware version is now visible on login screen

•    UPDATE: Added 5 second delay before localboot (to access safe mode)

•    UPDATE: Change filtering logic by dates in all grids on Appliance interface

 

BUGS SMASHED

•    FIXED: Issues restoring DR image backups with multiple disks

Introduced in 6.14, there was a reported issue wherein a second disk on a DR image backup would timeout during a restore, and we’ve squashed this out.

•    FIXED: VMware VMs that have not been powered on will not be protected

This is for all you VM templaters out there. If you upload a VM image to the VMware host, as a template, then this template VM would not be protected by our system until it was powered on. Now, admins can protect their VM templates without doing this step.

•    FIXED: Incremental Backup fails on VMware VM with a newly added disk

The workaround for this was to run a full after a new disk was added to a VM. That step is no longer needed.

•    FIXED: Hyper-V backup jobs hang/freeze

6.14 introduced new parallel processing to handle multiple jobs at one time. Some of our partners reported instances wherein Hyper-V backups would freeze and this has been resolved.

•    FIXED: Improved Replication Process

Customers may not have noticed a problem here as the job would restart where it got stuck and, at most, it would appear that a replication job took a bit longer than expected. We resolved this issue so replication jobs are more stable.

•    FIXED: False positive – failed verification status after replication

Verification of jobs would appear failed until an automated task would run after the job was completely closed. We’ve changed this verification step to instead be a part of the backup process, eliminating these false positives from alarming administrators and filling up ticketing queues (and the steps that go with closing them).

•    FIX: Hyper-V backup fails if the password had been modified

 

Not bad for less than 2 months since our last release, eh?

 

What’s Next?

Good news looking to the 6.16 release set for September/October 2018.

Key highlights for 6.16 are VMware 6.7 support and a new Super Agent with setup and configuration improvements.

Thank you!

-The Infrascale Product Team

As of January 1st, 2019, Infrascale Cloud Backup will no longer support backup of Windows XP and Windows Server 2003 endpoints

As of January 1st, 2019, Infrascale Cloud Backup will no longer support backup of Windows XP and Windows Server 2003 endpoints, regardless of installed version of the application. For other versions of Windows, latest updates are recommended to be installed.

Additionally, as of January 1st, 2019, Infrascale Cloud Backup will no longer support outdated application versions of desktop applications: Windows clients below v6.8 and Mac clients below 3.7.

What is happening and why?

As we release new versions of our Cloud Backup software to include additional features, better performance, and enhanced security, these versions are not always compatible with older operating systems. In fact, Microsoft stopped supporting Windows XP in April, 2014 and stopped supporting Windows Server 2003 in July 2015.

Currently, Windows XP and Windows Server 2003 utilize a TLS v.1.0 protocol and 3DES, AES128 ciphers (encryption algorithms) that pose vulnerabilities. To learn more about why Microsoft is encouraging its users to update from TLS v1.0, please read this blog from Microsoft: https://blogs.microsoft.com/microsoftsecure/2017/06/20/tls-1-2-support-at-microsoft/

The Transport Layer Security (TLS) protocol is intended to serve as a secure link between a client machine and the server or Web application.  While Infrascale supports a variety of other, more secure, TLS protocols, we have decided to stop supporting the TL1 v1.0 due to security concerns.

Phasing out outdated versions of our software and disabling vulnerable 3DES and AES128 ciphers allows us to strengthen the security of your data.

The National Institute of Standards and Technology (NIST) advises all users to migrate to stronger ciphers: https://csrc.nist.gov/News/2017/Update-to-Current-Use-and-Deprecation-of-TDEA

Your Options

Option A (Preferred): If you are running Infrascale Cloud Backup on a Windows XP computer or with Windows Server 2003, you need to update your operating system to Windows Vista or a later version by following the instructions on Microsoft’s support site.

Option B: You can put the data on a network share thatis accessible from other computers in your network. Infrascale’s Online Backup and Recovery Manager can be installed on another computer and you can continue backing up your data from the network share.

Also please make sure that you are running the latest version of Infrascale software that can be downloaded here: https://www.infrascale.com/downloads/

If you have any questions, please email support at support@infrascale.com

Summer 2018 Release – Infrascale DR v6.14

Infrascale Disaster Recovery (IDR) version 6.14 is our big summer release, and has much to offer our customers and partners in terms of improved quality-of-life changes, security features, performance improvements and bug fixes. We’d like to start by thanking all our partners, especially those that worked with us to find solid and timely solutions to not just issues, but to overall usability improvements.

The IDR v6.14 release is scheduled for public availability July 2nd, 2018.

NEED FOR SPEED

Our partners and customers will be happy to know that, in our lab, our teams were seeing massive (up to 5X) speed improvements when protecting VMware environments with v6.14.

When protecting Hyper-V, we also saw significant performance improvements by enabling backup jobs to run in parallel (improvements will be greater for those with larger appliances protecting many smaller backup jobs versus those with fewer, but larger jobs).

In both cases, the performance improvements will allow customers to more quickly get their environments protected, which means less hassle managing network and system I/O during initial and regular backups and an chance for improved restore point objective (RPO) goals (less data loss due to more frequent backups).

QUALITY OF LIFE IMPROVEMENTS

Quality of Life (QoL) improvements make up a bulk of the line items you’ll see below in the release notes, and range from usability improvements in the GUI, to new features that allow our administrators to automate testing and verification of the integrity of backups run to time-saving additions, enjoy.

The QoL list is highlighted by the new boot verification option. This means admins can run these tests and have automated reports with screenshots of systems running to help themselves and everyone around them sleep easy knowing the system will be there for them when it counts.

In addition, there are a ton of time-savers in here like allowing administrators to perform tasks from within the secondary appliance GUI rather than having to switch to the primary, automation during initial setup and the ability to define individual disks on VMware VMs for backup rather than being limited to selecting entire VMs.

There’s a lot here, so check out the list below:

  • NEW TIME SAVER: Mass-update appliance firmware from Dashboard (for firmware after 6.14.0)
  • NEW PEACE OF MIND: Boot Verification of backup jobs (individual jobs, stay tuned for boot orchestration verification!)
  • NEW REPORTS: Daily backup reports have added clarity regarding overall daily backups and the inclusion of the new* boot verification results
  • NEW CONTROL: Ability to select specific VMWare disks within a VM to help save on local and cloud (secondary) backup space usage
  • NEW: Support of IDR 550 appliance – stayed tuned for more info on this new, little workhorse for those smaller offices
  • TIME SAVER: Auto-configure RAID during initial provisioning of appliance, no reboot required
  • TIME SAVER: Simplification of QuickStart Wizard: no “Certificate” step, Time zone/Date/Time steps are combined for easier deployment and management of multiple IDR appliances
  • TIME SAVER: Allow manual deletion of jobs from secondary appliance
  • UX IMPROVEMENT: Revisited columns in Client / Summary view based on customer feedback
  • UX IMPROVEMENT: Client/Summary shows date of last successful backup for each client
  • UX IMPROVEMENT: Number of jobs pending replication is shown in Dashboard
  • UX IMPROVEMENT: Job message logs show timestamp
  • UPDATE: Default retention for new appliances is set to 3 months
  • UPDATE: Automatically delete failed jobs after 7 days
  • UPDATE: Protected Space calculation support for various file systems, software RAID, LVM, Windows Dynamic Disks

SECURITY

Security has long been a pillar of strength here at Infrascale, and we’ve brought some previously “upon request” options straight to your finger-tips. In addition, there are additional access controls that IT teams will appreciate, including the much in-demand ability to have multiple administrative logins. Check ’em out:

  • NEW: Ability to create multiple admin accounts on appliance (command line-only)
  • NEW: Email notification on login event for administrators
  • NEW: Option to require an appliance-specific password for remote access via Dashboard (that’s 2 sets of credentials, now)
  • NEW: Option to disallow Infrascale staff from accessing secondary appliance (we’ll ping you when needed)
  • NEW: Audit logs on-demand or via daily digest emails for key events–logins, DR boots, job deletions.
  • UPDATE: Email server settings support custom SMTP port and encryption
  • UPDATE: Enable/disable remote access on Dashboard credentials entry screen (enabled by default)

BUGS KILLED

Every sprint, our teams dedicate a portion of their efforts to killing bugs, dead. Here are the bugs we smashed with 6.14:

  • FIX: Multiple stability improvements for MS Exchange backup and recovery
  • FIX: Auto-archive option is now working
  • FIX: We’ve prevented a number of VMWare errors that would be thrown during backup/restore by using dynamic buffer size
  • FIX: Rather than shutting off before the process finished, the auto-download firmware update will remain on until the appliance has been successfully setup
  • FIX: Resolved some issues with large files replicating (but failing) to a secondary appliance (or cloud)
  • FIX: Remote access and Support Tunnel stability
  • FIX: Sort devices alphabetically in Orchestration

Click here to join or view our IDR v6.14 Release Webinar.

There is still a lot of summer left, stay tuned for news on the next 6.15 release for even more improvements.

Thank you!

-The Infrascale Product Team

We put the A in ITRA

You are going to be hearing a lot about ITRA in the near future. It is a term that industry pundits devised to denote the next evolution of DRaaS. This is a new acronym that literally means Information Technology Resilience Assurance. According to the Oxford dictionary, Assurance means “a positive declaration intended to give confidence; a promise.” When applied to a a description for a solution, it would then follow that the word “assurance” should denote a promise of performance, but most solutions labeled as ITRA solutions today lack any promise to perform or SLA for specifying the expected performance and timeframe for full recovery.

Today, businesses want the ability to maintain acceptable service levels even in the event of severe disruptions to their applications, data and IT systems. This means not waiting around for a disaster to occur, but rather incorporating early detection of events that may lead to downtime along with automated processes to mitigate any damage and minimize their impact on uptime.  To meet the needs of the vast majority of businesses, this requires a solution that is affordable, automated and easy to use.

Infrascale takes the last letter very seriously. In fact, we believe the industry is applying the ITRA too freely and to solutions that do not really fit the definition of the term.  If you are buying assurance you want a guarantee not fluff. As the only vendor with a solution that includes a 15-Minute Failover Guarantee, we believe we put the A in ITRA.

2018 Proper Disaster Recovery Planning

The backup and disaster recovery industry has seen plenty of changes over the years, but with these changes has also come increased cost and complexity to most BDR environments. If you’re wondering why disaster recovery is so expensive and complex, you aren’t the only one. Most of us are left crossing our fingers and hoping whatever we have in place will work.

But the reality is, that every modern company depends on data and operational uptime for their survival. There are no exceptions. Because of this, IT is tasked with finding an appropriate solution, that’s cost effective, and works every time – guaranteed. This can prove to be quite a daunting task!

In order to avoid data disaster in 2018, a proper disaster recovery plan must be put in place. But protecting large data sets in a mixed environment isn’t simple or affordable with traditional DR solutions. It’s why every business thinks push-button failover is out of reach. Let’s break down the challenges of ensuring data and operational uptime, by looking at three key considerations in proper disaster recovery planning:

1. Compatibility

A proper disaster recovery plan includes a flexible solution that can meet your needs. Can it protect any OS and device? Can you store your data in any cloud? Can it be deployed as physical or virtual? These are some of the questions you want to ask in order to plan your DR strategy properly. Making sure that you find a solution compatible with your environment is a critical component.

2. Complexity

Simple is key. Push-button failover should mean exactly that – so your disaster recovery planning should allow you to failover to a second site in minutes or seconds (not hours or days). No additional IT resources needed! When considering the complexity of a solution, built-in orchestration is one of the key differentiators in DR solution providers. Proper disaster recovery planning includes failback, and it’s important to understand exactly how the solution plans to failover your applications, and then failback, in addition to how much customization and control you have in the whole orchestration process.

3. Cost

Look for a solution that is truly as-a-service, meaning no add-on charges or professional service fees. When keeping costs in mind, remember that no additional secondary site means no additional hardware costs. Planning your DR strategy with a provider who can give you a low, monthly subscription service means that everything is included — support, maintenance, unlimited testing, and hardware upgrades to name a few. A service solution provides the benefit of a single monthly subscription payment, without the unnecessary add-on fees.

Disaster recovery planning doesn’t need to make your head spin. Keep these three considerations in mind as you go about your strategy and implementation. Here at Infrascale, we minimize the risk of downtime at a price that will make your CFO smile. We allow IT to stop buying and managing disparate hardware and software to solve their DR needs. An administrative dashboard, accessible from any browser or device, makes it easy to recover mission critical applications and systems with push-button simplicity.

Spring Cleaning Starts Early in 2018 – Disaster Recovery Release v6.13.2

Greetings!

As part of our continued efforts to make using Infrascale a pleasant experience that simplifies your backup and disaster recovery lives, we’ve started spring cleaning early this year with the release of Infrascale Disaster Recovery (IDR) v6.13.2.

Again, big thanks to all of our partners and admins that helped report these issues and find resolutions.

Continue reading for a detailed explanation of the release or scroll to the bottom for the list.

For Those Protecting VMware Environments

Issues Running Hourly Backups of VMware

A large piece in the fight against data loss is your restore point objective (RPO), or how frequently your backups are running. More frequent backups means your risk of data loss is reduced, that’s good. However,  many partners trying to reduce that risk to a mere hour reported JVM crashes when running in VMWare environments, requiring a manual full backup to run to resolve the issue. We’ve fixed this issue and are happy to say that you can now run hourly backups without concern.

VMware Snapshots Causing Storage Issues

The next item were reports of production storage being consumed by left-over VMware snapshots.  Occasionally,  our system left these VMware snapshots behind rather than cleaning them up, leaving some rather tedious work for admins. We’ve fixed the automated clean-up of these snapshots so you’ll no longer run into this problem.

Appliance Disconnects from VMware After Reboot

Classic tech-support steps, is it plugged in? Try restarting it. In many cases, we found that primary appliances would not automatically reconnect to VMWare after a reboot, meaning no backups will run. The result would be an influx of monitoring errors that backup jobs were skipped, sending support into a frenzy. We’ve fixed this so reboots no longer require a manual reconnection to VMware and after you update v6.13.2, VMware will automatically reconnect.

Additionally, we improved memory usage during VMware backup, so your backups should perform a bit faster now with fewer memory peaks.

And Now, the Bulk of the Release

Remote Access Goes Dark After a Connection Interruption

You’re working on a recovery, test or real, and suddenly you lose remote access and have to reconnect, queue heart palpitations and expletives. You just need to reconnect, but this costs time, and time is money, especially in a real downtime scenario. We’ve added some logic on our end to prevent this from happening. To reconnect, you have to go to the primary and restart the whole appliance or disable and re-enable remote access. This is a huge problem if you run into this problem and you’re not on site, which is most cases.

When accessing remote VMs after running cloud boot, admins would receive timeouts after on sessions 7 and beyond. We’ve upped the limit of how many remote access windows you can open from the Dashboard at a single time. You’ll still want to keep an eye on performance of your machine as you increase the number of sessions, but now you can launch as many as you like.

“Unknown” Status on Appliance Page

Similar to the remote access fix, those scary instances of “lost” appliances was also resolved. In the case that a connection interruption occurred between the appliance and the cloud infrastructure, admins would simply receive a shoulder shrug from the dashboard–no monitoring data, no usage data, nothing. The fix was to reboot the appliances, We’ve both changed the behavior so that your appliance doesn’t disappear in such a case, and we’ve put in work to ensure that connections are more stable.

Backups Stop with error “VimSDK Error: Bad Parameters of Function”

There were some reports of a “VimSDK error: bad parameters of function” that started to pop-up from the community. We found that the issue was caused when Windows provisions a disk with a partition larger than the disk, causing the backups to fail with the before mentioned error. Our system can now recognize this occurrence and will continue on with backups as before.

Can’t boot inconsistent NTFS Volumes

In the scramble after hard resetting a production server, administrators will often need to run a system utility, ChkDsk, to put the system back into a consistent state. If admins didn’t get the chance before a backup ran, then  our system would be unable to boot that or any subsequent version. While we can’t make things nicer on the Windows side, we did add a check before booting to see if it is inconsistent, and, if necessary, we’ll run ChkDsk utility so the boot can perform as expected.

Primary appliance stops working if the secondary is running a different version

For paired-appliance setups that are not replicating to Infrascale’s cloud, there was no automated update on the secondary appliance upon updating the primary. This caused the backups to fail as well as any replications, loading up your ticketing queue with a ton of errors. We’ve now automated the update of the secondary appliance once you’ve updated your primary.

Sluggish Backups after a Firmware Update

In a few cases, we had reports of extremely sluggish backup performance after a firmware update. We found an error that moved a vital catalog off the solid-state-drive (SSD) and onto the primary storage drives. While we can’t automate the fix, we have put in a place a warning telling the administrator to contact Infrascale support so we can dig in and move the catalog back to the right spot.

Unable to Download Files via “Browse and Restore”

The granular file recovery from the cloud appliance didn’t work. This is obviously a super-critical issue and we commend both the reporter and our team for jumping on it ASAP.

Hyper-V Recovery Speed and Bandwidth Improvement

During a backup, we protect only the data that exists and make a note of the empty blocks on each volume. But, during recovery, we were transferring these ’empty’ blocks. Transferring an empty block isn’t so bad, but transferring millions of them could significantly impact recovery time and waste valuable download bandwidth. We’ve changed the behavior to simply no longer send the empty blocks, and provide instructions for the recovery engine to provision as many empty blocks on each volume as when it was protected.

Unnecessary Job Replication

This is another, yet larger, bandwidth saver. If you had an appliance running without replication, then down the line began replicating offsite, your appliance might have been unnecessarily replicating data that would just be removed due to retention settings for the job. To resolve this, we now check the retention settings before each replication event begins, and, if the data is set to be deleted upon arrival, we simply don’t replicate and cancel the job. The replication status will indicate that the job was cancelled due to retention policies.

 

Release Notes

  • FIX: JVM crash during frequent incremental VMWare backups
  • FIX: Background cleanup of VMWare snapshots that were left behind
  • FIX: Do not replicate jobs that will be deleted remotely due to retention
  • FIX: Remote access may become unavailable after interruption of network connection between appliance and cloud infrastructure
  • FIX: Unable to open more than 6 remote access windows from Dashboard at the same time
  • FIX: Stalled information and “Unknown” status on Appliances page in Dashboard after interruption of network connection between appliance and cloud infrastructure
  • FIX: Restore of HyperV VMs only transfers information inside disk image and doesn’t transfer empty blocks
  • FIX: Proper DR Image backup of partitions that are outside of disk bounds (“VimSDK error: bad parameters of function”)
  • FIX: Primary appliance stops working after some time if secondary is on incompatible version
  • FIX: Notification on Appliance UI if Catalog volume is not on SSD
  • FIX: Always reconnect to VMware after reboot of appliance
  • FIX: Unable to “Browse and Restore” files from cloud appliance
  • FIX: Unable to perform boot of Windows machines with inconsistent NTFS
  • FIX: Memory leak in JNI during backup of VMWare

10 Ways DRaaS Can Save Your Bacon

Names can be so misleading.

Take baking soda.  Many of us think of baking soda as only an ingredient used for baking, or maybe something that helps to keep our refrigerators odor-free. But baking soda has many other uses, and is surprisingly good for your health and home, too. The use cases for baking soda vary from basic daily hygiene, injuries, digestive issues, stomach pain, coughs and even sore throats.

DRaaS (disaster recovery as a service) faces the same perceived limitations. There are many applications of DRaaS that go beyond recovering your operations in the wake of a genuine site-wide disaster.

That’s why we created this infographic: 10 Ways DRaaS Can Save Your Bacon.
Download your copy:  HERE.

  1. Recover from Ransomware…Fast

    It’s one thing for a user’s files to get infected by ransomware, it’s quite another to have a production database or mission-critical application infected. But, restoring these database and apps for a traditional backup solution (appliance, cloud or tape-based backup) will take hours or even days — which can cost a business tens of thousands of dollars.

  2. Acts of Nature- Hurricanes, Tornadoes & Floods

    If your data center gets knocked offline by mother nature, you need a Plan B to restore operations, so your employees can stay productive and your customers aren’t disrupted. DRaaS offers simplicity, rapid recovery, and lower costs (both in terms of infrastructure and administrative overhead). Just as important, replicating your backups and other key resources in geographically disparate data centers also means they won’t be wiped out by local disasters.  Because your data and VMs are replicated to the cloud, failing over production systems in the cloud takes minutes …even if your server is under water.

  3. End User Errors

    Recent ransomware attacks, including WannaCry and Petya, prove the adage that a chain is only as strong as the weakest link, and the weakest link in a data security chain is very often the end-user. Ransomware spreads easily across connected systems once a user unwittingly allows entry. Spoofing and phishing are not simply about stealing data or credit card numbers, they are about stealing access to systems. DRaaS equips you with a reset switch that helps you recover from end user mistakes – whether it be a phishing attack or accidental deletion of critical files.

  4. Spilt coffee, Power Surges, and Bad Disks (Micro-Diasters)

    While hurricanes and natural disasters grab all the headlines, it’s far more likely a company will face downtime from such mundane causes as hardware failure, corrupted software, human errors, or even spilt coffee. For this reason, a cloud DR solution that includes an on-site storage component and the ability to provide local, rapid recovery for failed servers have considerable appeal. DRaaS is absolutely built for these types of micros-disasters.

  5. Hardware Upgrades

    Traditionally, hardware refresh cycles have averaged around five years, but they have accelerated during the last decade. Some businesses now work on a three-year replacement cycle. Replacing servers and other critical hardware allows organizations to deploy updated equipment intended to improve reliability, enable new and anticipated capabilities, and save money in the long term, but they are usually accompanied with significant “planned downtime” — usually performed in the middle of the night or on weekends.With DRaaS, you can failover your production environment to the cloud where you can comfortably run your production operations – effectively eliminating any planned downtime. Once running in the cloud, you can perform the upgrade or refresh to your production equipment and then shift replication fromthe cloud to your production data center via “failback” procedures.

  6. Sandbox for Production Testing

    Maintaining a separate test environment can be expensive, especially when you want to do a test against full production data. Modern disaster recovery as a service solutions often include the ability to “sandbox” or partition virtual machines so testing can be done without impacting the still-functional production servers. Sandboxing is often much more difficult in typical on-premises solutions using traditional virtualization management tools.  Since it already contains replicas of your systems and built-in network connectivity, your DRaaS environment can easily be repurposed as a sandbox for production testing.

  7. Lift and Shift Workloads to the Cloud

    According to the Harvey Nash/KPMG 2016 CIO survey, 31% of responding CIOs said they are investing significantly in the cloud today and 49% expect to do so over the next three years. In fact, Forrester expects 50% of large enterprises to have production workloads running in the cloud by 2018. But migrating workloads to a public cloud demands a seamless, non-disruptive transition. This migration process is known as “lift and shift.” And here again, DRaaS can play an important role by enabling you to automatically capture workloads and migrate them to the cloud – effectively running your applications in failover mode in perpetuity.

  8. Pass Compliance Mandates with Flying Colors

    Disaster recovery solutions are just about table stakes for any modern organization, but they are especially important for public companies and organizations governed by compliance mandates (such as financial services, banking, and healthcare organizations). With DRaaS, testing and monitoring your DR plan is becoming simpler with Drag and Drop Orchestration, all baked into the cost of the subscription. Plus, leading DRaaS providers offer smaller agencies enterprise-class security and encryption of data in transit and at rest within top-tier data centers.

  9. Stolen Laptops

    Because of the mobility of today’s workforce and the dispersion of important intellectual property within your organization, companies more and more are asking, “So, I understand all this data is out there. What happens to it if something gets lost or stolen?” When your company laptop goes missing, it’s time to leap into action! Whether it was stolen from your car, forgotten in the airport security line or was physically wrenched from your hands in a grab-and-run. With DRaaS, you can quickly recover your data and applications on a new (or temporary) laptop since your data is always protected within the cloud.

  10. Ahhh…Peace of Mind

    It’s hard to put a price tag on peace of mind.  Real-time DR solutions are expensive, complex and require a fair amount of hand holding. DRaaS presents a refreshing alternative that provides flexibility in terms of commitment, capacity and cost.  But, more importantly, it’s your insurance policy that protects against the unexpected. DRaaS provides operational resiliency that lets you spin up VMs — locally or in the cloud — in minutes.  With a proper DRaaS solution, you’ll minimize the loss of production data and impact of downtime to your business.

DRaaS isn’t just for disasters any more. It’s for micro and macro outages. It’s designed for rapid failover of routine server outages and rapid recovery from full-blown ransomware attacks.  It’s for system upgrades, hardware refreshes and lift and shift migrations.

Like baking soda, there are probably other compelling use cases of DRaaS – ones that we haven’t even imagined.  If you’re leveraging DRaaS in interesting and unexpected ways to protect your organization, we want to hear from you — email us at team@infrascale.com.

Top 5 Ways to Protect Against Ransomware

You’ve seen the headlines — when it comes to ransomware strains like Locky, Wannacry, and Petya, we’re all at risk. What’s more, with the growing ransomware-as-a-service (RaaS) trend, cybercrime is now at an all-time high and accessible to nearly anyone.

Since the introduction of RaaS, negotiating with hackers is now a business in and of itself. We see websites offering up the latest advice to hackers, ransomware customer service lines, and FAQ available to help victims make Bitcoin payments.

So, why do organizations pay the ransom anyway? Well, in many cases, an organization’s systems were never backed up properly, or the backups were too old. In others, the recovery attempts failed – maybe there was no DR testing, leaving no usable backups from which to recover. Often the amount of time it takes to recover is far more costly — in terms of downtime — than paying the ransom fee itself. In other words, the process is simply broken.

What’s critical to understand is how ransomware gets into your organization, and more importantly, how you can protect your business from current and future threats of ransomware.

1. Best Practices for Ransomware Prevention

First and foremost, to protect against ransomware, start by doing what you can from a prevention standpoint.

  • Make sure servers and firewalls are all patched.
  • Update your anti-virus software with latest signatures.
  • Train users to recognize suspicious emails and attachments, and to identify nefarious websites.

While this may sound like old news, it’s a critical component to ensuring that you have a proper disaster recovery prevention plan in place.

2. Update Your Backup Process

Long gone are the days where overnight backups every 24 hours is sufficient for proper data protection. A quick and easy fix? Increase your backup frequency. In order to minimize downtime associated with an outage, you should be backing up in 15 minute increments. Your solution should be able to set policies on those backups alert the administrator to any errors.

Also, to protect against ransomware, data should be safely stored both on-premise and off-site. In addition, you want to ensure that you protect all of the servers in your environment, whether they be physical or virtual, with the same level of security. You may instinctively focus on mission-critical applications like Microsoft SQL, Exchange, and your financial systems, but don’t overlook those file servers that are also susceptible to attack.

3. Evaluate New Technology

The requirements mentioned above are now considered table stakes, and legacy backup systems simply just don’t cut it. Traditional backup applications will not be able to sufficiently address the capabilities needed for a modern data protection and ransomware solution, because they take too long to recover running systems. That’s where Disaster Recovery as a Service comes in, better known as DRaaS. DRaaS replicates and protects your entire environment and let’s you quickly failover your systems – not just files and folders – to ensure uptime and availability when something goes wrong like in a ransomware attack.

When considering new technologies to protect against ransomware, take into account that there are many different ways to define a DRaaS solution. Ensure that you’re comfortable with how you will be able to backup and recover critical systems and data, as well as the flexibility in backup targets and recovery options. Ensure that your chosen solutions also addresses compliance mandates, as needed.

4. Early Detection Capability

In a ransomware attack, time is your worst enemy. By the time encryption hits, you could have thousands of files encrypted in mere seconds. What’s worse, if you wait for your end users to identify that encryption is spreading via a ransomware attack, you’re going to have a much larger problem on your hands. The longer it takes to detect an issue, the more files are getting encrypted!

Ransomware can spread like wildfire, but early detection capabilities are available. IT needs a solution that will measure high change rates in files, thus using the way ransomware works — against it. Ransomware opens files and changes files in the system. Protect against ransomware by utilizing a solution that can identify a high change rate of modified files on a per-user basis.

If you’re using the 15-minute backup frequency we recommend, you can prevent most of the damage of the attack by simply having this proactive alerting system in place.

5. Lightning Fast Failover

If you are infected with ransomware and have to recover your data and systems, an important concern is to ensure the recovery process is faster and easier than paying the ransom. There could be hundreds of thousands of files infected, and you need to recover them quickly. Your best bet will be recovering the full server, rather than individual files.

Failover technology will give you the ability to boot and run from a backup. But, not all failover solutions are created equal. Only certain solutions give you the ability to boot from the backup and run either on-premise or in the cloud.

With the Infrascale Disaster Recovery (DRaaS) solution you can simultaneously cloud boot multiple versions of the same machine to determine the safe version to recover, and boot either to the cloud, a virtual environment, or recover to production hardware. The total downtime is about 1-2 minutes, saving a lot of time and money. The Infrascale DRaaS solution also includes built-in failover orchestration that lets you create predetermined failover plans, which can be scheduled to boot simultaneously or in a specific order.

No matter what DR solution you choose, it’s so important to understand exactly how the solution plans to failover your applications, and then failback, in addition to how much customization and control you have in the whole orchestration process.

With these 5 recommendations in place, you’re closer to staying protected against the current threats of ransomware. There’s no telling what ransomware attacks will look like in 2018, but we know that Ransomware will continue to get more sophisticated, more intelligent, and more harmful as time goes on. You can’t completely prevent ransomware, but you can keep yourself educated and up-to-date on the most recent technology solutions available. Also, look to the experts to vet and validate what you learn when it comes to ransomware protection.

The Infrascale approach is getting a lot of attention, from leading analyst firms like Gartner and others. Gartner named us the 2015 Cool Vendor in Business continuity and disaster recovery, a 2016 Visionary in Disaster recovery as a service, and a Leader in the 2017 Magic Quadrant for DRaaS.

Want to see for yourself? Download a copy of the report here.

Why ‘RTO’ is Key to Business Continuity

If you have never seen or heard the term ‘RTO’ in the context of your business continuity plans or tests, then this will give you a solid next step to ensure that you’re in a good position. Unfortunately, nearly 80% of all SMBs are in the same boat, which has been and continues to be massively exploited by criminal organizations using ransomware to make money. Lots of it.

To paraphrase an old tech adage “if you can’t recover quickly, then it’s not a backup.”

What is RTO?

Recovery Time Objective, or RTO, is the time it will take to restore business operations in any event of downtime caused by hardware failures, ransomware infections, software errors, human errors, and natural disasters

Unfortunately, for many businesses, the problems that arise when RTO is not a key component of the plan isn’t realized until it’s too late. Many organizations have found this out over the last few years because of the ever-growing threat of ransomware attacks.

Many businesses with preventive measures and backups in place end up in a bad situation because their plan didn’t factor in the recovery time for restoring production databases or mission-critical applications. Read our Tale of Two Ransomware Victims for more info.

What is business continuity and what role does RTO play?

Business continuity is the ability for a business to remain in operation despite risks and events of downtime and disasters. By the numbers, 80% of businesses experience some type of unplanned downtime.  Of this total, some experience catastrophic outages that knocks them offline for 3-5 days – and apportion of these never recover and ultimately out of business as a result of the outage.

Simply put, RTO is Business Continuity.

A proper business continuity plan includes:

  1. Identification of potential downtime risks
  2. Evaluating the business impact of those risks
  3. Identifying ways to prevent those risks
  4. Identifying ways to recover from downtime
  5. Regular testing of those methods against specific risks
  6. Regular re-evaluation of risks & methods

Your prevention and recovery needs are directly based on the evaluation of risks. Such an evaluation is known formally by Project Management Professionals (PMPs) as a “Risk Registry.” Don’t worry, it sounds like more work than it is.

It’ll actually save you time as ensure that all your bases are covered by understanding your critical systems and their dependencies.

Evaluating Your Risks

Evaluating risks can start pretty general and become more specific as you get closer to making buying decisions. For example, the table below was developed by American Precision Industries that focuses on recovery at a system level.

Application/Data/SystemImpactChanceRisk FactorRecovery Plan
CAD application server99%100%99%Infrascale Disaster Recovery replicating from site A to site B. Local boot for testing or individual machines. File recovery readily available from either site. Spare hardware required in the event of hardware destruction. Restore time is less than 20 minutes once hardware is available for recovery.
Machine Tools100%<1%<1%N/A. These units are closed systems.
CAD files80%100%80%Files are protected by Infrascale Disaster Recovery and replicated to a secondary DR appliance and are available for restore within minutes. Files can be recovered to any USB device to then be fed to the machining tools’ systems.
Payroll DB60%100%60%Infrascale Disaster Recovery replicating from site A to site B. Local boot available for recovery in less than 10 minutes. Production recovery time dependent on available hardware, less than 20 minutes once available.
Customer/Order DB80%100%80%Infrascale Disaster Recovery replicating from site A to site B. Local boot available for recovery in less than 10 minutes. Production recovery dependent on available hardware, less than 20 minutes once available.
CAD user endpoints70%100%70%Systems are backed up centrally and covered in DR backups onsite and replicated to the secondary. Endpoints can be restored within 20 minutes once hardware or a VM is available.

 

The table above shows the impact to the business in terms of “how much of the business will be inoperable if this system goes down?” with the chance of that system experiencing downtime (all risks included), and the risk factor, which is the product of Impact and Chance. The rule of thumb is to pay close attention to any Risk Factor over 10%.

Once all systems are listed and evaluated, you can begin posing options for various disaster recovery options and RTO objectives. This will ensure that you have a plan that you need rather than a mix of “too much” or even worse, “too little”.

You can also add specific uptime goals for specific systems, like this:

Application/Data/SystemHardwareOSRTO, Uptime
CAD application serverIBM compatibleWindows<12 hours, 99%
Machining ToolsProprietaryProprietaryNA, 99.9%
CAD filesIBM compatibleWindows<12 hours, 99%
Payroll DBIBM compatibleWindows<24hours, 99%
Customer/Order DBIBM compatibleWindows<24 hours, 99%
CAD user endpointsVariousWindows<12 hours, 99%

 

The benefit of this preplanning far outweighs any time you saved by skipping it and “hoping” it’ll be enough. Every year, thousands of businesses discover that their “hope” was indeed a poor plan when something takes their business out of operations and they scramble to get back online.

Unfortunately, when it comes to recovery, there are no second chances.