December – 2018 – Robert M. Lee

Attribution is not Transitive – Tribune Publishing Cyber Attack as a Case Study

December 31, 2018

I made a number of tweets on this subject but then the voice of Richard Bejtlich entered my head and told me that all twitter threads should be a blog post, and here I am. This blog looks at the cyber attack on Tribune Publishing and the claims that North Korea is responsible as an opportunity to highlight that attribution is not a transitive property.

A thread on attribution as it pertains to Ryuk, Tribune Publishing, Lazarus Group, alliances, and operators vs developers.

— Robert M. Lee (@RobertMLee) December 31, 2018

Shortly after Tribune Publishing lost operations and ability to print papers the press highlighted that there was a cyber attack. The attack was highlighted as a targeted attack by a nation-state. This was all related to one anonymous insider at the company telling the media. Thus, early on I, and many others on social media, called for calm and patience while the details became public. The details are still not public and the company hasn’t officially responded but an insider told media sources that the malware used in the attack was Ryuk which is a family of ransomware (Checkpoint did a great write-up on it here). Checkpoint did some great analysis on the malware and noted that there is commonality in some aspects of the malware and another family of malware called Hermes. They appropriately highlight that while Hermes has been attributed in use to Lazarus Group before, there are alternative explanations including the group who developed Ryuk having access to the source code of the Hermes malware. There are likely alternative hypotheses not explored here as well. However, that link is seemingly being used by others to draw the conclusion that Tribune Publishing was attacked by North Korea.

Here is Forbes making that claim. They are not the only ones though. Fortunately, some journalists took a different approach. Here the New York Times accurately notes that just because (and if) Ryuk was used doesn’t mean it has government ties (thanks David and Nicole!). They also introduce alternative hypotheses including Adam Meyers from CrowdStrike stating that CrowdStrike tracks an eastern european cyber crime group that leverages the malware.

So what is the logic that led to the North Korean claims and what lessons can we extract?

Seemingly, the logic leveraged was that Ryuk has a link to Hermes. Hermes has links to Lazarus Group. Lazarus Group has been attributed to North Korea. Therefore, all uses of Ryuk must be North Korea. That is transitive attribution and is an association fallacy.

The logic seems kind of sound though, so what’s the problem?

There are a few large issues at play for us to explore.

First, we all have a collection bias. I.e. what we analyze is based off what we collect. We cannot know the true extent of collection available, so it is common for analysts to assume their collection is pretty good in comparison to what’s available. In fact though, it’s almost always the opposite where our collection is much worse than we realize. If Ryuk or Hermes malware was leveraged by teams other than Lazarus that would pose a big issue for the attribution claims. The “uniqueness” of malware is directly tied to collection. In perfect collection you could factually state if malware is unique to one team or not. But without perfect collection, and no one has perfect collection, we must understand that malware may appear unique to a specific team but may not be unique to them at all. It just may be unique to them in our collection.

Second, the links Checkpoint drew were not definitive and had other hypotheses identified. Therefore, if an assessment is going to be drawn out for attribution purposes and not the malware analysis purposes done in their blog, we’d need to do a more structured assessment including more data sources such as additional intrusions and cluster those intrusions using some model like the Diamond Model for Activity Groups. Moving the malware usage to a cluster of intrusions would reveal more data points to then start working on a more structured assessment.

Third, I have not looked deeply into Hermes but we’d want to explore the connections Hermes had to Lazarus Group and do the same type of analysis on the links mentioned in the second point for Ryuk.

Fourth, Lazarus Group is a collection of clusters of intrusions from across multiple researchers, teams, and organizations. Whereas Lazarus Group was at one point decently well defined it has come to represent to many a larger clustering of anything North Korean in nature with links to any aspect of known Lazarus Group activity. This is not a put down on any team that tracks the Lazarus Group, it’s simply a realization that the analyst bias that goes into selecting intrusions and putting them into the Lazarus Group is done differently across different teams and thus a super group is not likely to be granular in its accuracy. (I talk about the problem with this type of threat tracking here). This may not matter at all if you want to attribute the principal group responsible for Lazarus Group as North Korea. But attribution, especially when you want to attribute all parties and not just the one chiefly responsible, is not binary. Not every single intrusion that goes into the clustering of “Lazarus Group” is going to be accurate. Not every single intrusion is going to be North Korea developed malware used by North Korea operators. There are alliances (North Korea allies in other states or organizations), there are supply chains (where they source exploits, code, etc.), there are operators vs. developers, there are different operations teams, there are different customers of the operation’s intelligence requirements, etc. to consider. All this means that if you want to do attribution to North Korea off of Lazarus Group you can get to a pretty good confidence level (likely Moderate Confidence if you’re just basing it off of intrusion analysis). If you’re wanting to reverse engineer individual aspects of that grouping though the attribution wouldn’t necessarily hold. I.e. individual families of malware, intrusions, aspects of malware such as encoding routines, etc. could all be an important puzzle piece in multiple puzzles, not just Lazarus Group.

The fourth point is the biggest hindrance in attribution being transitive. All the puzzle pieces that go into doing an assessment can be important. But by themselves they are likely not. I’ve seen so many people ask for the “smoking gun” when talking about intelligence analysis. The FBI’s attribution of North Korea to the Sony Attack comes to mind (which I wrote about here in Wired) where the FBI’s assessment was sound but the infosec community wanted them to “prove it” so they released some technical pieces of evidence, which to the FBI probably seemed pretty good in hindsight but to the public were not conclusive. This is a common analyst mistake. When you do analysis there are pieces of evidence that become really important to you, but only in context of all the analysis you did. I.e. it’s really important to you now with all the knowledge you have about the case. But you needed a lot of other data and context to have it be important. So releasing just “the important stuff” externally will not likely resonate with others who cannot come to the same conclusions you did on just partial data. Even with identical data sets two analysts will likely come to different conclusions anyway. To address this you never get in the habit of arguing about evidence, you position and argue your assessment. The totality of the data and your analysis, not just pieces of data.

All this is a round about way of saying that if you take a piece of data from an assessment (such as links to Hermes malware) and take it away from all the other data, then you cannot take the assessment with that piece of data. You cannot just simply look for Hermes malware to pop up and go “yup that’s Lazarus Group”. Further, links of Hermes to other malware families like Ryuk and thus attacks where Ryuk show up further complicate the issue. The more analytical leaps you make the less likely your assessment is going to be sound.

This doesn’t mean that the attack wasn’t done by North Korea. If it was knowing their intention would be an entirely different and especially difficult assessment to make. But at this point, no actual assessments have been done. The only thing being highlighted in certain media outlets is transitive attribution because of links observed in different malware families. This is sloppy and will lead to numerous inaccuracies. Additionally, there can be political issues if high profile targets like the New York Times and Wall Street Journal (luckily they haven’t) come out and attribute the attack to North Korea. That puts pressure on the US government as well as the North Korea government. There are real impacts to attribution claims between states.

In summary, as an analyst you should be aware that assessments do not often have a transitive property. Understand your collection biases and what goes into the assessments you make. From there, if you need to make a new assessment, then you need to go through the process of collecting data and analyzing and producing an assessment, short cuts such as transitive analysis will not be better than a low confidence assessment. Do not strive for perfection where you have analysis paralysis (sometimes it’s ok to make gut calls as an analyst) but understand when something is a guess, a hypothesis that’s missing plenty of data sources and other hypotheses are also equally possible (low confidence), or when you’ve done structured analysis across multiple data sources to achieve a higher level of confidence (moderate or high).

December 2018

Attribution is not Transitive – Tribune Publishing Cyber Attack as a Case Study