Sunday, April 08, 2018

What happened at Facebook with Cambridge Analytica

I see a few people asking who don't really understand the technology of data and marketing and what happened with Cambridge Analytica.

Firstly there is the right or legit way Facebook uses and sells data for marketing purposes.

As a general rule Facebook provides aggregated data to advertisers, so if you lived in Oxfordshire in the UK, then Facebook aggregates all data of people who it identifies as living in Oxfordshire. It would then in accordance with general marketing data rules (this is a sample size, not the legal rule)  they would remove some data as it easily identifies a person which risks further identifications of actual people and then bundle up some insights either as a data set or through their own software for marketers. Cambridge Analytica is different in two ways.

First, they directly engaged with US peoples accounts on facebook using an app. I won't call out any, however, you can be reasonably assured any app on Facebook isn't there for your enjoyment, it is to get at your data. Cambridge Analytica simply created an app, had people engage with the app. It might have been guessing my age or funny captions for a photo, but it reeled in a significant number of people. When you connected to that app it would have asked for certain permissions. It now officially had access to whatever data you agreed to share. Here is where the game changed. Cambridge Analytica had realised that due to a bug that they could access the data not only what you had provided for, but additional data and just about as much from your friends. Now as many of us who don't live in the US have friends in the US. Cambridge Analytica did not discriminate, they simply sucked down anything within their ability to access and took all that data into a pond of their own making.

A couple of things, unless you sent credit card details via a messenger post to someone there is little risk of credit card data being there. If your full home address is there, then you are at risk of having that in the pond of data Cambridge Analytica gathered.
Why did they go to that trouble and what was the outcome.
Without providing a lesson in statistics, there are plenty of them freely available on the web, the whole thing was to identify what your associations are and therefore profile you, particularly if you reside in the US, I am betting they did it for everyone with varying degrees of success if they have your data. Why??
If you liked a post about a lot of environmental issues it means you might not like people taking down environmental protection rules. How do I convince you that the person who is planning taking down the environmental laws is the good guy I make sure that I provide you information that shows why the current rules favour someone who is not you(you are missing out). This is the sneaky way in which they targeted people with ads or fake news items which were to alter peoples perception. If you can create cognitive dissonance with someone you have a reasonable chance of changing their mind.
This has been an ongoing trend in businesses that a customers data is more important than the customer as once they have it they have it potentially for life.

GDPR will put pressure on operators in Europe and the rest of the world need to follow the European standard. Countries which don't have data provisions similar to GDPR need to start moving or expect to be annihilated at the next elections for failing to protect their citizens from such unruly behaviour and such a disregard for users data.

To recap there was a breach in the way the facebook was being accessed through the provided interface. This allowed Cambridge Analytica to access far more data than they ever should have been allowed to.
This was a major failure in Facebook engineering and their privacy and security practices.
This data then appears to have been misused to target ads to people. It is quite possible someone you sit next to at work was getting a very different political message than you and your friends with a similar group being other cohorts from your workplace and possibly user group you share membership of.

Facebook is in hot water in a variety of countries right now including the hearing in Congress next week.

See ya round


Wednesday, January 03, 2018

Azure Internal Load Balancer Configuration - SQL Server Failover Cluster Instance on Azure Virtual Machines

I have just finished working with a colleague, resolving some issues with creating an IAAS SQL Server cluster in Azure. It took some trial and error and there are some real gaps in information, hopefully, this will help to fill one of them.

Let me first start by saying due to a lack of free diagnostics within Azure, you will need access to insights. This isn't something you would do with an MSDN subscription and might baulk at with your own paid for one.  Why not with MSDN, because with MSDN you can only use a minimal amount of Network Watcher resource and you require network watchers to do any diagnostics at the network level. I hope you have a friendly boss who will give you a space to do some learning and develop these needed Azure skills. alternatively,  or you have a bit of budget to do your own subscription and pay for resources. Make sure you have budget alerts to not blow out your costs and delete the Network watchers as soon as you have a working system.

We had to investigate the linkage between the Azure load balancer and the SQL Server Cluster and Network addresses on the SQL Server cluster. To see what was happening we needed App Insights and Network watchers.

What was the scenario

We had a SQL Server cluster without yet enabling Always on Services. We found that whilst we could connect to the node from which SQL Server was running we couldn't get a connection from any other server or system in our subnet, the problem was hidden as you cant do much to find out why what you are seeing is the problem and where it is.
We installed Wireshark, yes it is your best friend here and yes it is telling the truth even though it seems stupid it isn't seeing what you expected. We couldn't see any traffic when we tried the connecting to the cluster. We got the outbound packet to initiate a connection and then "crickets", nothing responding from the cluster or the load balancer.

Let's go back a step, one of the items in your list of tasks when doing this is to create the SQL cluster and then an internal load balancer.

Let me tell you the instructions on how to configure the internal load balancer in all the Microsoft documents and any blog posts I came across were terrible in the detail.

The load balancer requires Health Probe and that requires an active port on your individual nodes to validate their availability.

People have listed ports around 50k things like 52486 or 62159. Here is the missing bit it has to be an active service running on your server and it can't be anything to do with SQL Server, they are bound to the cluster and you are not able to access them via the individual node IP address

How do you work out a port, two things if you have a reason or don't overly care, you can install IIS and you will have port 80, however, try netstat -an and have a look, you will get entries like this
netstat -an listening ports
netstat - listening ports

 Proto  Local Address          Foreign Address        State
TCP                LISTENING
Use one of these ports for your backend service that is the ones near the start of the list and with the Local Address starting with, these are listening on all IP addresses on your system.

Once you configure the load balancer with this in the above section you will find your cluster SQL Server will become available to your other services in your Azure space. You will now be able to use SQL Server Management Studio to connect to the active node in the cluster on the cluster IP address


Thursday, December 14, 2017

Brisbane Yow 2017 Review

Last week I attended Yow for my first time. It provided some great talks in my broader interests of data and analytics. That is the space I currently consult in. Thanks to my employer Readify for the tickets and the opportunity to attend. We have twenty professional development days a year and that was two taken.

Day 1

Day one started with a great keynote from Dr Denis Bauer of the CSIRO talking about the challenges of working with big data in the form of the human genome. Denis was joined by Lynn Langit and they worked through explaining the project and who it was fitted together. If you didn't know the genome sequence in data form require a database table with 3 billion columns, yes that is 3 with a B. Currently there is no relational database wi th the capability to create a table that wide (Pity the poor designer or DBA required to model that one). Of course, this is a big data problem of an order of magnitude significantly large. Denis spoke about how she and the team had set up workloads in AWS to process genomic datasets to deliver real opportunities to identify relationships between peoples genomes to find markers for genetic conditions. Thereare many challenges and great opportunities. I was lucky enough to get some time to have a chat to Denis later in the day and whilst it is exciting there are some real issues to deal with such as misuse and abuse from a variety of parts of society. I enjoyed the talk and then the conversation later. Fascinating women with a fantastic mind.
Image result for human genome creative commons

I then attended a talk about problems with Agile delivered by Jeff Paton. As Readify where I currently work has a very agile approach to the way we work, I was interested to hear about the supposed problem and remediation. Jeff makes very effective use of a style akin to the old writing on slides using an overhead projector. It was engaging, I learned a few things about the place of the product owner and how we as participants in the Agile community by its use can help our product owners be better.

Next in my day was AWS Security by Aaron Bedra. Aaron made many good points about securing the cloud and its services and the fact which I wholeheartedly agree with is that cloud done right is more than likely much more secure than many data centres. I learned a few things and was reminded to check some work for a current client I am working with.
 Getting up a system in the cloud can be very fast compared to a traditional data centre, however, with that comes a number of risks. Aaron spoke about the checklist and things you can do to make sure your approach to security is sound.

Next on my day was Jim Webber, as a DBA I am always interested in database technology. As neo4j jas a strong market presence and now SQL Server includes a graph database this was an opportunity to learn more. I had a few items of basic knowledge reinforced and then Jim went on to talk about consistency in large-scale databases and what they had changed to handle this. The use of Causal consistency and a causal clustering architecture. deliver better throughput, large-scale clustering and a method to maintain the integrity of data in the database. Totally enjoyed expanding my knowledge of graph databases.

The day was progressing and next up was Chanuki Illushka Seresinhe. Chanuki spoke about what about beauty makes us happy and was it possible to quantify with deep learning. This was interesting to learn more about some of the concepts of deep learning. Chanulki also spoke about the fact there are limited large datasets in some domains to do testing in other regions and also other domains. Even with the large dataset, she had access to from Scenic or Not there were large gaps in information which made the dataset less than ideal. This potentially causes all sorts of biases one of the very real problems with computer AI

Next, my afternoon continued with more computer learning and two great talks on Machine Learning
First up was Julie Pitt. Julie spoke about the issue with training AI, biases and problems with algorithms. The key piece of her talk was about framing the AI problem right and discussing why it is way past where we thought we would have robots in our homes and yet the problems which stop them happening are still present. Julie is reframing the problem to have self-learning robots who adapt to ever-changing environmental situations. Her work is looking at simple problems like making sure that the robot won't assume shortest path is correct ie jumping from the second floor to the patio is the quickest way. As a kid, I grew up reading sci-fi books and Asimov. The facts that some of these problems have been well understood since then means we have work to do. Julie went on the show how having the concept of a zone where the robot survives and part of its job is to learn and maintain it's survival was a really interesting concept to unpack. She spoke about biases and wrongful outcomes. I was lucky enough to speak to Julie after at the networking drinks about some of her presentation and she is a wonderfully engaging person to speak with about her discipline. Oh and apparently I might have to learn Scala

My other presentation I attended on the day on machine learning was Jennifer Marsmen. Jennifer took us through a journey of capturing data in a novel way with the EPOC EEG Headset and analysing the data from it to deduce if we could use brainwave patterns to identify lies. Jennifer was engaging and spoke with great humour to convey her message. One of the key problems which I frequently encounter across data work in all disciplines is data quality. The headset needed to be set up correctly on the volunteer to obtain consistent quality readings to be able to verify the data. Once again I was able to have some time speaking to Jennifer about her data research and the ML capabilities. She spoke about the use of Azure ML and give a few very quick insights into understanding the ML algorithms available and methods of training in Azure ML or any ML system.

We then wrapped up the talks of the day with Dave Farley talking about Software Engineering and if the term is right to describe what developers do. Dave spoke at length discussing terms of skills in other disciplines of engineering.  Should software engineers experiment, Dave said yes and explained that it is a frequent part of civil engineering, for example using models to wind tunnel test the design of a highrise is a form of experimentation and is done to minimise risk and to manage eventual building costs. Dave went on to talk about where software development is at in terms of levels of where other industries are at. He then talked about defining what engineering is and isn't and how work we do is in fact able to be a discipline of engineering. we just have to get some things right and we are not doing that now.

Networking drinks and hors-d'oeuvres ended the day, I caught up with a few speakers notable Jennifer and Julie as a data person and what they were doing was of great interest. I also spoke to Denis this evening. Spoilt to have some time talking with these women.

Day 2

The second day opened with a bang Linda Liukas opened to tell us about Hello Ruby and teaching young children about computers and computing concepts. The Hello Ruby Books are really an amazing creation and what Linda has done is fantastic. Concepts talked about include learning how a loop feels and Ruby's favourite loop I will let you buy the books to find out. If you have young kids around or if you just want to have a fun learning about computers in a non-threatening way these books are for you. Linda is fascinating to speak to one on one, we talked about adding the Hello Ruby books and activities to local daycare activities. I am certainly adding them to my library. Possibly my favourite speaker and talk of Yow

The second stop of the day the blue room and Sara Chipps, the question do you believe an 8yr old girl can programme in C++? Let's talk about Jewelbots. Sara has designed and developed an Arduino based bracelet for girls. They are a rather simple looking device but as an Arduino, device packs a punch, not so much in what they can do but in what they are delivering. Due to the simple design and compact space, the Jewelbot couldn't house a compiler for higher level languages. Instead, the owner when she want's to program, programs in C++ and then bootstraps the device with her new code. A young woman and yes 8 yrs old did some live coding to configure a device. She was a champion, dealt with technical issues with grace and charm. Her parents should be proud and her school as well.

Third stop and off to hear about Dynamic Reteaming from Heidi Helfand. This was a really interesting talk on handling the problems from building and reconfiguring team, no team stays the same. No matter how long it has been together the whole team will change at some time, someone leaves or is promoted. Hiedi provided a lot of great examples and her experiences of reteaming and some ideas how to make it work, even choose your own team. Some great insights into human dynamics and teams.

After lunch another keynote with Gregor Hohpe. He talked about Enterprise Architecture, discussed a number of problems and some solutions. As an EA he ripped it into those who sit in ivory towers and provide colourful diagrams which are often thought of as meaningless in the world of day to day operations and project teams. He then talked about various patterns in Architecture and I went straight out the next day to review a few things in light of his comments. I have been working in a Solution Architect role amongst other titles on my current project. I enjoyed what he was talking about as it fits with a lot of what I  think about the EA role, probably comes from using PEAF as my preferred Architecture methodology/framework.

 Next up I listened to Katrina Owen talk about her accidental open source project and all the problems when you become a maintainer. Katrina is the maintainer of a coding education site she created out of a need to make it easier to test and challenge students she was teaching in a coding program. Much of what tore her up in trying to fix problems as a maintainer were people issues, dealing with competing priorities, maintaining balance and sorting things out to do just enough to avoid burnout which she didn't for a period. One of Katrina's lessons, "What are you not going to do today?" That is something we all need to learn. Other things include who or what are you doing your thing for, who matters because otherwise, everyone's opinion is right. Another great talk

Unfortunately, that is where my Yow day ended with speakers. I had to attend a conference call which went way too long, however, it served a purpose in my project and was needed to get some things rolling.
I did get to finish up the day with a beer and network with a whole lot of people. It was here where I was able to catch up with Linda amongst others of the speakers and a number of other attendees

Overall I had a great experience, caught up with a few old associates, made some new fledgeling connections and was able to get some time networking with great speakers. Jump over to the Yow site if any of the authors interest you the slides of the talks are up and videos to come. Yow has links back to websites, Linkedin and Twitter for the speakers

Let's see who is coming to Yow next year as to whether I decide to attend, I am sure there will be some great speakers, so its hurry up and wait until they are announced.

Thursday, September 07, 2017

Serverless Local build in Docker - Part 1

I wanted to set up a serverless-local build for using on a friends project. This will allow a certain amount of localised testing before pushing changes to the AWS Lambda service.
I decided to build a Centos VM and then added docker this allowed for a lot of portability.
Here are the steps I took

First, build the centos VM that is pretty straightforward for most and there are lots of good guides around. I use Virtual Box from Oracle as my VM service of choice.

Next installed Docker, just follow along here and you shouldn't have much trouble installing docker and getting a centos container running
Install Docker
yum install docker
Don't forget to test it with
docker run hello-world

I then used this container config to install a centos container with node already installed This is a lot easier than building the entire node and npm install within a bare container. That can be another post when I get that one up

Once this is up then you can use the page here  to install serverless-local into your project and allow you to do a lot of local testing

Saturday, July 01, 2017

SQL Server Linux - a quick look

After a few changes, I now have a SQL Server Linux installation running in a totally Linux environment.

Whilst there is still a dearth of tools for Linux-based SQL server it is very doable and manageable.
I have two choices of tools which I found to mostly be useful for managing SQL Server with no Windows present. It is going to be some time before we see if ever Management Studio on Linux so you are going to have to keep reviewing alternative tools with which to manage Linux with

The first is Microsoft official

There are good instructions for all main distros to help you install sqlcmd and add the directory of the command files to your path.

Firstly install .Net Core to your Linux installation for the Ubuntu installation.

Next, install Microsoft Tools mssql-tools

once this is installed and right now if you are running 17.04 Ubuntu you will need to review the bug page for tools and install Preview versions as there are compatibility bugs which appear ironed out in those releases.

To access the data from some sort of GUI you might like Visual Studio Code as an editor which can interact with the SQL Server Database. You need to install the mssql extension.

The alternative I am using is DBDeaver
It is based on Eclipse and seems rock solid as a database management tool. Now of course without all the proprietary bits of Management Studio your database T-SQL skills are going to have to raise a notch. You will need to grab the latest JDBC drivers from Microsoft

Now one final thing with VS Code is that it hasn't quite worked out Linux. Microsoft there is no C: lettering in file paths and the slashes in the path names need correcting, but I guess we can live with that for now.

DBBeaver file paths
VS Code file paths
 Overall it is great to see SQL Server advancing on Linux and I can see a lot of places using SQL Server Development Edition on Linux to eliminate a bundle of cost from their no production environments. It is a great step forward and look to the future when it goes full GA

See ya round


Thursday, June 22, 2017

Virtual box and Dell XPS 15

Just acquired a Dell XPS 15 through a new job. I have been setting up a few Windows and Linux Servers as VM's using Virtualbox.

There is a real problem with resolution when initially installing just about any guest OS. It makes it firstly hard to install and secondly difficult to work with.
Enter the world of Scaling VM displays. This is a fantastic help however it comes with its own problems.

A problem with scaling is the menu for the running  VM you have focused on goes missing by default, therefore, you can't do things like installing guest additions whilst in that mode.

Friday, January 20, 2017

VirtualBox on Ubuntu 16.04 DNS Failure From Guest OS

I have a couple of VM's on my system running a variety of operating systems for testing and learning. Recently U upgraded the host system to Ubuntu 16.0.4 when I installed a new SSD. I moved the VMs from the old spinny disk and VM's start fine.
Yay! I can log into the VMs but realise there is a problem with DNS. I can ping outside hosts and all seems fine with the network. Started changing network settings in one of the guest OS and no, nothing is happening

After some investigation, this appears to be due to a change in how Ubuntu is doing DNS and that the Guest OS are unable to handle that natively.

The cure
Enable two parameters as the user you run your VM's under

VBoxManage modifyvm "Centos 7" --natdnsproxy1 on
VBoxManage modifyvm "Centos 7" --natdnshostresolver1 on
VBoxManage -natnetwork list
NAT Networks:

Name:        NatNetwork
IPv6:        No
Enabled:     Yes

Make sure all VM are stopped and then  run this, it willl crash VirtualBox but makes sure next time you start VirtualBox the new network settings will apply

VBoxManage natnetwork stop --netname "NatNetwork"