SCALA DUMP ZONE
Helper page for Scala snippets
-
XML is primarily used as a way of storing data, and looks a bit like HTML but with custom tags:
<ParentTag> <ChildTag> <BabyTag name="Billy">Inner Text</BabyTag> </ChildTag> </ParentTag>
XML is native to Scala, and writing it is very easy. There are two main types that you can assign XML to in Scala:
Elem
andNodeSeq
. There is alsoNode
, which is one single "tag" inside a NodeSeq.scala> val xml = <ParentTag> | <ChildTag> | <BabyTag name="Billy">Inner Text</BabyTag> | </ChildTag> | </ParentTag> xml: scala.xml.Elem = <ParentTag> <ChildTag> <BabyTag name="Billy">Inner Text</BabyTag> </ChildTag> </ParentTag> scala> val xml2: scala.xml.NodeSeq = <ParentTag> | <ChildTag> | <BabyTag name="Billy">Inner Text</BabyTag> | </ChildTag> | </ParentTag> xml2: scala.xml.NodeSeq = <ParentTag> <ChildTag> <BabyTag name="Billy">Inner Text</BabyTag> </ChildTag> </ParentTag>
You can search through XML in Scala for various different attributes:
Inner Text
scala> (xml \\ "BabyTag").text res0: String = Inner Text
If there is no match (ie the Node doesn't exist or is empty), an empty String will be returned rather than an error being thrown. You can then validate that it exists with something like
if(!value.isEmpty)
.Tag Attributes (with
@
)scala> (xml \\ "BabyTag" \ "@name").text res1: String = Billy
Note:
(xml \\ "BabyTag" \ "@name").text
==xml \\ "BabyTag" \@ "name"
Getting the inner tags back
scala> (xml \\ "ChildTag") res2: scala.xml.NodeSeq = NodeSeq(<ChildTag> <BabyTag name="Billy">Inner Text</BabyTag> </ChildTag>)
You can then
.map
on each individual node if you want to do things like creating a List of values:scala> val xmlWithNamespace = <ParentTag> | <ChildTag>My Text</ChildTag> | <ChildTag>My Text2</ChildTag> | </ParentTag> xmlWithNamespace: scala.xml.Elem = <ParentTag> <ChildTag>My Text</ChildTag> <ChildTag>My Text2</ChildTag> </ParentTag> scala> (xmlWithNamespace \\ "ChildTag").map(_.text) res3: scala.collection.immutable.Seq[String] = List(My Text, My Text2)
Getting a Node's namespace
scala> val xmlWithNamespace = <ParentTag> | <ChildTag xmlns="https://www.example.com">My Text</ChildTag> | </ParentTag> xmlWithNamespace: scala.xml.Elem = <ParentTag> <ChildTag xmlns="https://www.example.com">My Text</ChildTag> </ParentTag> scala> (xmlWithNamespace \\ "ChildTag").map(_.namespace) res4: scala.collection.immutable.Seq[String] = List(https://www.example.com)
Searching through all Nodes, including nested Nodes
Sometimes, you will want to get back all nodes (nested or not). To do this, search with the XML search wildcard
_
, ie(xml \\ "_")
.For example, to count the number of every Node without knowing the structure of the XML beforehand:
scala> val xml = <family> | <mother name="julie" /> | <father name="harold" /> | <child name="billy" status="good child" /> | <child name="charlie" status="good child" /> | <child name="mandy" status="bad child" /> | <child name="nigel" status="bad child" /> | <extendedfamily> | <uncle name="jeff" /> | <auntie name="vicky" /> | <cousin name="little boy 1" /> | <cousin name="little boy 2" /> | </extendedfamily> | </family> xml: scala.xml.Elem = <family> <mother name="julie"/> <father name="harold"/> <child name="billy" status="good child"/> <child name="charlie" status="good child"/> <child name="mandy" status="bad child"/> <child name="nigel" status="bad child"/> <extendedfamily> <uncle name="jeff"/> <auntie name="vicky"/> <cousin name="little boy 1"/> <cousin name="little boy 2"/> </extendedfamily> </family> scala> val familyMap = (xml \\ "_").groupBy(_.label).map { case (k, v) => (k, v.size) } familyMap: scala.collection.immutable.Map[String,Int] = Map(mother -> 1, auntie -> 1, uncle -> 1, child -> 4, extendedfamily -> 1, father -> 1, cousin -> 2, family -> 1) scala> familyMap foreach { | case (k, v) => println(s"$k count: $v") | } mother count: 1 auntie count: 1 uncle count: 1 child count: 4 extendedfamily count: 1 father count: 1 cousin count: 2 family count: 1
View my answer on StackOverflow for more context and info on searching through XML -
In order to parse XML in a Scala Play application, you can either take it as type
xml
or typeanyContent
. Example POST methods which take in an XML request body and pass them back as a String are below:def index: Action[NodeSeq] = Action(parse.xml) { implicit request => val xml = request.body Ok(xml) }
def handlePost(): Action[AnyContent] = Action(parse.anyContent) { implicit request => Try(request.body.asXml.get) match { case Success(xml) => Ok(xml) case Failure(ex) => UnsupportedMediaType } }
It is recommended to handle the request as
anyContent
so that you can handle theUnsupportedMediaType
exception manually (e.g. log the error then redirect or return the appropriate response code). -
Modifying nodes can become quite ugly and complicated, so the best way to do this cleanly is with a helper function. Below is a function which has multiple uses, along with the XML we will manipulate. Obviously, this can be split into multiple functions too.
def rewriteXml(xml: NodeSeq, search: Elem, newNode: NodeSeq = NodeSeq.Empty): NodeSeq = xml.foldLeft(NodeSeq.Empty){ (acc: NodeSeq, curr: Node) => curr match { case elem: Elem if elem.label == search.label => acc ++ newNode case elem: Elem => acc ++ Elem( elem.prefix, elem.label, elem.attributes, elem.scope, elem.minimizeEmpty, rewriteXml(NodeSeq.fromSeq(elem.child), search, newNode):_*) case node => acc ++ node } }
val xml = <node1> <node2 xmlns="http://www.example.com"> <node3>James dislikes XML</node3> <node4>XML is useful</node4> </node2> </node1>
-
Changing a Nodes's text
This searches for a Node and replaces it with any NodeSeq we give it. Input:
rewriteXml(xml, <node3/>, <node3>James loves XML</node3>)
Output:
res0: scala.xml.NodeSeq = <node1> <node2 xmlns="http://www.example.com"> <node3>James loves XML</node3> <node4>XML is useful</node4> </node2> </node1>
-
Removing a Node
This strips a given Node out of the XML by replacing the given Node with nothing. Input:
rewriteXml(xml, <node3/>)
Output:
res0: scala.xml.NodeSeq = <node1> <node2 xmlns="http://www.example.com"> <node4>XML is useful</node4> </node2> </node1>
-
Duplicating a Node
To duplicate a Node, we will have to modify the original function (this modification will not break the other examples).
def rewriteXml(xml: NodeSeq, search: Elem, newNode: NodeSeq = NodeSeq.Empty, duplicateNode: Boolean = false): NodeSeq = xml.foldLeft(NodeSeq.Empty){ (acc: NodeSeq, curr: Node) => curr match { case elem: Elem if elem.label == search.label && duplicateNode => acc ++ elem ++ elem case elem: Elem if elem.label == search.label => acc ++ newNode case elem: Elem => acc ++ Elem( elem.prefix, elem.label, elem.attributes, elem.scope, elem.minimizeEmpty, rewriteXml(NodeSeq.fromSeq(elem.child), search, newNode, duplicateNode):_*) case node => acc ++ node } }
Input:
rewriteXml(xml, <node3/>, duplicateNode = true)
Output:
res0: scala.xml.NodeSeq = <node1> <node2 xmlns="http://www.example.com"> <node3>James dislikes XML</node3> <node4>XML is useful</node4><node4>XML is useful</node4> </node2> </node1>
-
Adding a Node
To add a Node, we will have to modify the original function again. A new Node will be added below the
search
Node.def rewriteXml(xml: NodeSeq, search: Elem, newNode: NodeSeq = NodeSeq.Empty, duplicateNode: Boolean = false, addNewNode: Boolean = false): NodeSeq = xml.foldLeft(NodeSeq.Empty){ (acc: NodeSeq, curr: Node) => curr match { case elem: Elem if elem.label == search.label && duplicateNode => acc ++ elem ++ elem case elem: Elem if elem.label == search.label && addNewNode => acc ++ elem ++ newNode case elem: Elem if elem.label == search.label => acc ++ newNode case elem: Elem => acc ++ Elem( elem.prefix, elem.label, elem.attributes, elem.scope, elem.minimizeEmpty, rewriteXml(NodeSeq.fromSeq(elem.child), search, newNode, duplicateNode, addNewNode):_*) case node => acc ++ node } }
Input:
rewriteXml(xml, <node4/>, <node5>Here's an additional Node</node5>, addNewNode = true)
Output:
res0: scala.xml.NodeSeq = <node1> <node2 xmlns="http://www.example.com"> <node3>James dislikes XML</node3> <node4>XML is useful</node4><node5>Here's an additional Node</node5> </node2> </node1>
-
Changing a Namespace
To change or add a Namespace, we will once again have to modify the original function.
def rewriteXml(xml: NodeSeq, search: Elem, newNode: NodeSeq = NodeSeq.Empty, duplicateNode: Boolean = false, addNewNode: Boolean = false, namespace: String = ""): NodeSeq = xml.foldLeft(NodeSeq.Empty){ (acc: NodeSeq, curr: Node) => curr match { case elem: Elem if elem.label == search.label && duplicateNode => acc ++ elem ++ elem case elem: Elem if elem.label == search.label && addNewNode => acc ++ elem ++ newNode case elem: Elem if elem.label == search.label && namespace.nonEmpty => acc ++ Elem( elem.prefix, elem.label, elem.attributes, NamespaceBinding(null, namespace, elem.scope), elem.minimizeEmpty, rewriteXml(NodeSeq.fromSeq(elem.child), search, newNode, duplicateNode, addNewNode):_* ) case elem: Elem if elem.label == search.label => acc ++ newNode case elem: Elem => acc ++ Elem( elem.prefix, elem.label, elem.attributes, elem.scope, elem.minimizeEmpty, rewriteXml(NodeSeq.fromSeq(elem.child), search, newNode, duplicateNode, addNewNode, namespace):_*) case node => acc ++ node } }
Input:
rewriteXml(xml, <node4/>, namespace = "https://james-work-account.github.io/")
Output:
res0: scala.xml.NodeSeq = <node1> <node2 xmlns="http://www.example.com"> <node3>James dislikes XML</node3> <node4 xmlns="https://james-work-account.github.io/">XML is useful</node4> </node2> </node1>
-
-
It is probably easier to show how to do this rather than trying to explain it step by step.
import java.io.{ByteArrayOutputStream, StringReader} import javax.xml.transform.stream.{StreamResult, StreamSource} import javax.xml.transform.{Result, Source, TransformerFactory} import scala.util.{Failure, Success, Try} import scala.xml._ val xml: Elem = { <Message> <Header> <MessageDetails> <Name>James</Name> <Timestamp>2006-01-05T15:31:59.000</Timestamp> </MessageDetails> <SenderDetails/> </Header> <Body> </Body> </Message> } val ns = "http://www.example.com" val xslt = s""" |<xsl:stylesheet version="1.0" | xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> | <xsl:output method="xml" indent="yes" omit-xml-declaration="yes" /> | | <xsl:template match="Message"> | <xsl:element name="{local-name()}" namespace="$ns"> | <xsl:apply-templates select="node() | @*" /> | </xsl:element> | </xsl:template> | <xsl:template match="node() | @*"> | <xsl:copy> | <xsl:apply-templates select="node() | @*" /> | </xsl:copy> | </xsl:template> | |</xsl:stylesheet> """.stripMargin def transformXml(xml: String): NodeSeq = Try { val xmlSource: Source = new StreamSource(new StringReader(xml)) val outputStream = new ByteArrayOutputStream() val result: Result = new StreamResult(outputStream) val transformerFactory = TransformerFactory.newInstance val transformer = transformerFactory.newTransformer(new StreamSource(new StringReader(xslt))) transformer.transform(xmlSource, result) outputStream.toString.replaceAll(":?ns0:?", "") } match { case Success(xmlAsString) => XML.loadString(xmlAsString) case Failure(ex: Throwable) => throw ex } transformXml(xml.toString())
This example uses an XSLT which is loaded from a String, but loading from an external file isn't too different. This specific XSLT adds a namespace to the
Message
Node.There are ways to transform XML using Scala libraries, but the simplest way I've found is to use the Java TransformerFactory. Most of the work is done for you behind the scenes; all you need to do it set it up correctly.
What you will need:
- XML to be transformed, as a
String
- XSLT to be used, loaded using a StringReader in this case but can be loaded from an external file using a Java FileReader
In my example I have a
replaceAll
addition, as the actual output of this specific XSLT changes the Message node to<ns0:Message xmlns:ns0="http://www.example.com">
rather than<Message xmlns="http://www.example.com">
; it is not necessary.After the transformation is complete, you can turn the transformed XML into a NodeSeq using
XML.loadString(your XML)
. I have put this transformation in a Try block so that if it fails at any point, the exception will be caught. In this example there is just one genericFailure(ex: Throwable)
catch, but you can specify which different exceptions you wish to catch if you want to handle them all differently (e.g. with different bespoke logging messages). - XML to be transformed, as a
-
Any applications which accept external XML with DocType Declarations in them are vulnerable to XXE attacks by default. Thankfully, the Playframework handles these attacks automatically by not allowing any DTD in XML POSTed to your application. Below is an example of some XML containing a DTD.
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE foo [ <!ELEMENT foo (bar)> <!ELEMENT bar (#PCDATA)> ]> <foo> <bar>string</bar> </foo>
This example is harmless, but it could be much more serious.
In order to test that your Play application is secure against XXE attacks, you can write the following Routes test:
import org.scalatestplus.play.PlaySpec import org.scalatestplus.play.guice.GuiceOneAppPerSuite import play.api.mvc.Call import play.api.test.FakeRequest import play.api.test.Helpers.{POST => POST_REQUEST, _} class RoutesSpec extends PlaySpec with GuiceOneAppPerSuite { /** * Test to verify POSTing XML with DTD will fail due to default Application behaviour. * By default, Play blocks all XML with any DTD in it due to potential XXE vulnerability. * This means that the call will fall over at `Action.async(parse.xml)` (or equivalent). * `scala.xml.XML.loadString` has XXE vulnerability, so POSTing the XML as `scala.xml.Unparsed` gets around this. * */ "The Play Application" must { "not handle XXE XML" in { lazy val xmlWithDTD = scala.xml.Unparsed( """<?xml version="1.0" encoding="utf-8"?> |<!DOCTYPE foo [ |<!ELEMENT foo (bar)> | <!ELEMENT bar (#PCDATA)> |]> |<foo> | <bar>string</bar> |</foo> """.stripMargin) val Some(result) = route(app, FakeRequest(Call(POST_REQUEST, "/your-app-route")) .withXmlBody(xmlWithDTD)) ) status(result) mustEqual 400 contentAsString(result) mustBe "" } } }
Importantly, the usual
loadString
method is vulnerable to attacks so loading the XML as ascala.xml.Unparsed
gets around this.If you are accepting XML without the security of the Playframework, there are other ways to accept XML safely. My personal preference is to use a secure SAX parser when calling the
loadString
method.def secureSAXParser = { val saxParserFactory = SAXParserFactory.newInstance() saxParserFactory.setFeature("http://xml.org/sax/features/external-general-entities", false) saxParserFactory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true) saxParserFactory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false) saxParserFactory.newSAXParser() } XML.withSAXParser(secureSAXParser).loadString(your XML)
This protects against things like XXE attacks.